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Streptococcus pneumoniae Polynucleotides and Sequences 

FIELD OF THE INVENTION 

5 The present invention relates to the field of molecular biology. In 

particular, it relates to, among other things, nucleotide sequences of Streptococcus 
pneumoniae, contigs, ORFs, fragments, probes, primers and related 
polynucleotides thereof, peptides and polypeptides encoded by the sequences, and 
uses of the polynucleotides and sequences thereof, such as in fermentation, 
10 polypeptide production, assays and pharmaceutical development, among others. 

BACKGROUND OF THE INVENTION 

Streptococcus pneumoniae has been one of the most extensively studied 

15 microorganisms since its first isolation in 1881. It was the object of many 
investigations that led to important scientific discoveries. In 1928, Griffith 
observed that when heat-killed encapsulated pneumococci and live strains 
constitutively lacking any capsule were concomitantly injected into mice, the 
nonencapsulated could be converted into encapsulated pneumococci with the same 

20 capsular type as the heat-killed strain. Years later, the nature of this "transforming 
principle," or carrier of genetic information, was shown to be DNA. (Avery, 0.1. , 
etal.J. Exp. Med., 79:137-157 (1944)). 

In spite of the vast number of publications on S. pneumoniae many 
questions about its virulence are still unanswered, and this pathogen remains a 

25 major causative agent of serious human disease, especially community-acquired 
pneumonia. (Johnston, R.B., et al, Rev. Infect. Dis. 7i(Suppl. 6):S509-517 
(1991)). In addition, in developing countries, the pneumococcus is responsible for 
the death of a large number of children under the age of 5 years from pneumococcal 
pneumonia. The incidence of pneumococcal disease is highest in infants under 2 

30 years of age and in people over 60 years of age. Pneumococci are the second most 
frequent cause (after Haemophilus influenzae type b) of bacterial meningitis and 
otitis media in children. With the recent introduction of conjugate vaccines for H. 
influenzae type b, pneumococcal meningitis is likely to become increasingly 
prominent. S. pneumoniae is the most important etiologic agent of community- 
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acquired pneumonia in adults and is the second most common cause of bacterial 
meningitis behind Neisseria meningitidis. 

The antibiotic generally prescribed to treat S. pneumoniae is 
benzylpenicillin, although resistance to this and to other antibiotics is found 
5 occasionally. Pneumococcal resistance to penicillin results from mutations in its 
penicillin-binding proteins. In uncomplicated pneumococcal pneumonia caused by 
a sensitive strain, treatment with penicillin is usually successful unless started too 
late. Erythromycin or clindamycin can be used to treat pneumonia in patients 
hypersensitive to penicillin, but resistant strains to these drugs exist. Broad 
10 spectrum antibiotics (e.g., the tetracyclines) may also be effective, although 
tetracycline-resistant strains are not rare. In spite of the availability of antibiotics, 
the mortality of pneumococcal bacteremia in the last four decades has remained 
stable between 25 and 29%. (Gillespie, S.H., et ai, J. Med. Microbiol. 28:237- 
248 (1989). 

15 S. pneumoniae is carried in the upper respiratory tract by many healthy 

individuals. It has been suggested that attachment of pneumococci is mediated by a 
disaccharide receptor on fibronectin, present on human pharyngeal epithelial cells. 
(Anderson, B.J., et ai, J. Immunol. 742:2464-2468 (1989). The mechanisms by 
which pneumococci translocate from the nasopharynx to the lung, thereby causing 

20 pneumonia, or migrate to the blood, giving rise to bacteremia or septicemia, are 
poorly understood. (Johnston, R.B., et ai, Rev. Infect. Dis. /3(Suppl. 6):S509- 
517(1991). 

Various proteins have been suggested to be involved in the pathogenicity of 
S. pneumoniae, however, only a few of them have actually been confirmed as 

25 virulence factors. Pneumococci produce an IgAl protease that might interfere with 
host defense at mucosal surfaces. (Kornfield, S.J., et ai, Rev. Inf. Dis. 3:521- 
534 (1981). S. pneumoniae also produces neuraminidase, an enzyme that may 
facilitate attachment to epithelial cells by cleaving sialic acid from the host 
glycolipids and gangliosides. Partially purified neuraminidase was observed to 

30 induce meningitis-like symptoms in mice; however, the reliability of this finding 
has been questioned because the neuraminidase preparations used were probably 
contaminated with cell wall products. Other pneumococcal proteins besides 
neuraminidase are involved in the adhesion of pneumococci to epithelial and 
endothelial cells. These pneumococcal proteins have as yet not been identified. 

35 Recently, Cundell et- ai, reported that peptide permeases can modulate 
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pneumococcal adherence to epithelial and endothelial cells. It was, however, 
unclear whether these permeases function directly as adhesions or whether they 
enhance adherence by modulating the expression of pneumococcal adhesions. 
(DeVelasco, E.A., et ai, Micro. Rev. 59:591-603 (1995). A better understanding 
5 of the virulence factors determining its pathogenicity will need to be developed to 
cope with the devastating effects of pneumococcal disease in humans. 

Ironically, despite the prominent role of S. pneumoniae in the discovery of 
DNA, little is known about the molecular genetics of the organism. The S. 
pneumoniae genome consists of one circular, covalently closed, double-stranded 

10 DNA and a collection of so-called variable accessory elements, such as prophages, 
plasmids, transposons and the like. Most physical characteristics and almost all of 
the genes of 5. pneumoniae are unknown. Among the few that have been 
identified, most have not been physically mapped or characterized in detail. Only a 
few genes of this organism have been sequenced. (See, for instance current 

15 versions of GENBANK and other nucleic acid databases, and references that relate 
to the genome of S. pneumoniae such as those set out elsewhere herein.) 

It is clear that the etiology of diseases mediated or exacerbated by S. 
pneumoniae, infection involves the programmed expression of S. pneumoniae 
genes, and that characterizing the genes and their patterns of expression would add 

20 dramatically to our understanding of the organism and its host interactions. 
Knowledge of S. pneumoniae genes and genomic organization would improve our 
understanding of disease etiology and lead to improved and new ways of 
preventing, ameliorating, arresting and reversing diseases. Moreover, 
characterized genes and genomic fragments of S. pneumoniae would provide 

25 reagents for, among other things, detecting, characterizing and controlling 5. 
pneumoniae infections. There is a need to characterize the genome of S. 
pneumoniae and for polynucleotides of this organism. 
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SUMMARY OF THE INVENTION 

The present invention is based on the sequencing of fragments of the 
5 Streptococcus pneumoniae genome. The primary nucleotide sequences which were 
generated are provided in SEQ ID NOS: 1 -39 1 . 

The present invention provides the nucleotide sequence of several hundred 
contigs of the Streptococcus pneumoniae genome, which are listed in tables below 
and set out in the Sequence Listing submitted herewith, and representative 

10 fragments thereof, in a form which can be readily used, analyzed, and interpreted 
by a skilled artisan. In one embodiment, the present invention is provided as 
contiguous strings of primary sequence information corresponding to the 
nucleotide sequences depicted in SEQ ID NOS: 1-391 . 

The present invention further provides nucleotide sequences which are at 

15 least 95% identical to the nucleotide sequences of SEQ ID NOS: 1-391. 

The nucleotide sequence of SEQ ID NOS: 1-391, a representative fragment 
thereof, or a nucleotide sequence which is at least 95% identical to the nucleotide 
sequence of SEQ ID NOS: 1-391 may be provided in a variety of mediums to 
facilitate its use. In one application of this embodiment, the sequences of the 

20 present invention are recorded on computer readable media. Such media includes, 
but is not limited to: magnetic storage media, such as floppy discs, hard disc 
storage medium, and magnetic tape; optical storage media such as CD-ROM; 
electrical storage media such as RAM and ROM; and hybrids of these categories 
such as magnetic/optical storage media. 

25 The present invention further provides systems, particularly computer- 

based systems which contain the sequence information herein described stored in a 
data storage means. Such systems are designed to identify commercially important 
fragments of the Streptococcus pneumoniae genome. 

Another embodiment of the present invention is directed to fragments of the 

30 Streptococcus pneumoniae genome having particular structural or functional 
attributes. Such fragments of the Streptococcus pneumoniae genome of the present 
invention include, but are not limited to, fragments which encode peptides, 
hereinafter referred to as open reading frames or ORFs, fragments which modulate 
the expression of an operably linked ORF, hereinafter referred to as expression 

35 modulating fragments or EMFs, and fragments which can be used to diagnose the 
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presence of Streptococcus pneumoniae in a sample, hereinafter referred to as 
diagnostic fragments or DFs. 

Each of the ORFs in fragments of the Streptococcus pneumoniae genome 
disclosed in Tables 1-3, and the EMFs found 5' to the ORFs, can be used in 
5 numerous ways as polynucleotide reagents. For instance, the sequences can be 
used as diagnostic probes or amplification primers for detecting or determining the 
presence of a specific microbe in a sample, to selectively control gene expression in 
a host and in the production of polypeptides, such as polypeptides encoded by 
ORFs of the present invention, particular those polypeptides that have a 

10 pharmacological activity. 

The present invention further includes recombinant constructs comprising 
one or more fragments of the Streptococcus pneumoniae genome of the present 
invention. The recombinant constructs of the present invention comprise vectors, 
such as a plasmid or viral vector, into which a fragment of the Streptococcus 

15 pneumoniae has been inserted. 

The present invention further provides host cells containing any of the 
isolated fragments of the Streptococcus pneumoniae genome of the present 
invention. The host cells can be a higher eukaryotic host cell, such as a mammalian 
cell, a lower eukaryotic cell, such as a yeast cell, or a procaryotic cell such as a 

20 bacterial cell. 

The present invention is further directed to isolated polypeptides and 
proteins encoded by ORFs of the present invention. A variety of methods, well 
known to those of skill in the art, routinely may be utilized to obtain any of the 
polypeptides and proteins of the present invention. For instance, polypeptides and 

25 proteins of the present invention having relatively short, simple amino acid 
sequences readily can be synthesized using commercially available automated 
peptide synthesizers. Polypeptides and proteins of the present invention also may 
be purified from bacterial cells which naturally produce the protein. Yet another 
alternative is to purify polypeptide and proteins of the present invention from cells 

30 which have been altered to express them. 

The invention further provides methods of obtaining homologs of the 
fragments of the Streptococcus pneumoniae genome of the present invention and 
homologs of the proteins encoded by the ORFs of the present invention. 
Specifically, by using the nucleotide and amino acid sequences disclosed herein as 
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a probe or as primers, and techniques such as PCR cloning and colony/plaque 
hybridization, one skilled in the art can obtain homologs. 

The invention further provides antibodies which selectively bind 
polypeptides and proteins of the present invention. Such antibodies include both 
5 monoclonal and polyclonal antibodies. 

The invention further provides hybridomas which produce the above- 
described antibodies. A hybridoma is an immortalized cell line which is capable of 
secreting a specific monoclonal antibody. 

The present invention further provides methods of identifying test samples 
10 derived from cells which express one of the ORFs of the present invention, or a 
homolog thereof. Such methods comprise incubating a test sample with one or 
more of the antibodies of the present invention, or one or more of the DFs of the 
present invention, under conditions which allow a skilled artisan to determine if the 
sample contains the ORF or product produced therefrom. 
1 5 hi another embodiment of the present invention, kits are provided which 

contain the necessary reagents to carry out the above-described assays. 

Specifically, the invention provides a compartmentalized kit to receive, in 
close confinement, one or more containers which comprises: (a) a first container 
comprising one of the antibodies, or one of the DFs of the present invention; and 
20 (b) one or more other containers comprising one or more of the following: wash 
reagents, reagents capable of detecting presence of bound antibodies or hybridized 
DFs. 

Using the isolated proteins of the present invention, the present invention 
further provides methods of obtaining and identifying agents capable of binding to 

25 a polypeptide or protein encoded by one of the ORFs of the present invention. 
Specifically, such agents include, as further described below, antibodies, peptides, 
carbohydrates, pharmaceutical agents and the like. Such methods comprise steps 
of: (a) contacting an agent with an isolated protein encoded by one of the ORFs of 
the present invention; and (b) determining whether the agent binds to said protein. 

30 The present genomic sequences of Streptococcus pneumoniae will be of 

great value to all laboratories working with this organism and for a variety of 
commercial purposes. Many fragments of the Streptococcus pneumoniae genome 
will be immediately identified by similarity searches against GenBank or protein 
databases and will be of immediate value to Streptococcus pneumoniae researchers 
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and for immediate commercial value for the production of proteins or to control 
gene expression. 

The methodology and technology for elucidating extensive genomic 
sequences of bacterial and other genomes has and will greatly enhance the ability to 
5 analyze and understand chromosomal organization. In particular, sequenced 
contigs and genomes will provide the models for developing tools for the analysis 
of chromosome structure and function, including the ability to identify genes within 
large segments of genomic DNA, the structure, position, and spacing of regulatory 
elements, the identification of genes with potential industrial applications, and the 
10 ability to do comparative genomic and molecular phylogeny. 

DESCRIPTION O F THE FIGURES 

FIGURE 1 is a block diagram of a computer system (102) that can be 
15 used to implement computer-based systems of present invention. 

FIGURE 2 is a schematic diagram depicting the data flow and computer 
programs used to collect, assemble, edit and annotate the contigs of the 
Streptococcus pneumoniae genome of the present invention. Both Macintosh and 

20 Unix platforms are used to handle the AB 373 and 377 sequence data files, largely 
as described in Kerlavage et al., Proceedings of the Twenty-Sixth Annual Hawaii 
International Conference on System Sciences, 585, IEEE Computer Society Press, 
Washington D.C. (1993). Factura (AB) is a Macintosh program designed for 
automatic vector sequence removal and end-trimming of sequence files. The 

25 program Loadis runs on a Macintosh platform and parses the feature data extracted 
from the sequence files by Factura to the Unix based Streptococcus pneumoniae 
relational database. Assembly of contigs (and whole genome sequences) is 
accomplished by retrieving a specific set of sequence files and their associated 
features using Extrseq, a Unix utility for retrieving sequences from an SQL 

30 database. The resulting sequence file is processed by seq_filter to trim portions of 
the sequences with more than 2% ambiguous nucleotides. The sequence files were 
assembled using TIGR Assembler, an assembly engine designed at The Institute 
for Genomic Research ( TIGR ) for rapid and accurate assembly of thousands of 
sequence fragments. The collection of contigs generated by the assembly step is 

35 loaded into the database with the lassie program. Identification of open reading 
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frames (ORFs) is accomplished by processing contigs with zorf or GenMark. The 
ORFs are searched against S. pneumoniae sequences from GenBank and against all 
protein sequences using the BLASTN and BLASTP programs, described in 
Altschul et ai, J. Mol. Biol. 215: 403-410 (1990)). Results of the ORF 
5 determination and similarity searching steps were loaded into the database. As 
described below, some results of the determination and the searches are set out in 
Tables 1-3. 

DETAILED DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS 

10 

The present invention is based on the sequencing of fragments of the 
Streptococcus pneumoniae genome and analysis of the sequences. The primary 
nucleotide sequences generated by sequencing the fragments are provided in SEQ 
ID NOS:l-391. (As used herein, the "primary sequence" refers to the nucleotide 

1 5 sequence represented by the IUPAC nomenclature system. ) 

In addition to the aforementioned Streptococcus pneumoniae polynucleotide 
and polynucleotide sequences, the present invention provides the nucleotide 
sequences of SEQ ID NOS: 1-391, or representative fragments thereof, in a form 
which can be readily used, analyzed, and interpreted by a skilled artisan. 

20 As used herein, a "representative fragment of the nucleotide sequence 

depicted in SEQ ID NOS: 1-391" refers to any portion of the SEQ ID NOS: 1-391 
which is not presently represented within a publicly available database. Preferred 
representative fragments of the present invention are Streptococcus pneumoniae 
open reading frames ( ORFs ), expression modulating fragment ( EMFs ) and 

25 fragments which can be used to diagnose the presence of Streptococcus 
pneumoniae in sample ( DFs ). A non-limiting identification of preferred 
representative fragments is provided in Tables 1-3. As discussed in detail below, 
the information provided in SEQ ID NOS: 1-391 and in Tables 1-3 together with 
routine cloning, synthesis, sequencing and assay methods will enable those skilled 

30 in the art to clone and sequence all "representative fragments" of interest, including 
open reading frames encoding a large variety of Streptococcus pneumoniae 
proteins. 

While the presently disclosed sequences of SEQ ID NOS: 1-391 are highly 
accurate, sequencing techniques are not perfect and, in relatively rare instances, 
35 further investigation of a fragment or sequence of the invention may reveal a 



WO 98/18931 PCT/US97/19588 



nucleotide sequence error present in a nucleotide sequence disclosed in SEQ ID 
NOS: 1-391. However, once the present invention is made available {i.e., once the 
information in SEQ ID NOS: 1-391 and Tables 1-3 has been made available), 
resolving a rare sequencing error in SEQ ID NOS: 1-391 will be well within the 
5 skill of the art. The present disclosure makes available sufficient sequence 
information to allow any of the described contigs or portions thereof to be obtained 
readily by straightforward application of routine techniques. Further sequencing of 
such polynucleotide may proceed in like manner using manual and automated 
sequencing methods which are employed ubiquitous in the art. Nucleotide 

10 sequence editing software is publicly available. For example, Applied Biosystem's 
(AB) AutoAssembler can be used as an aid during visual inspection of nucleotide 
sequences. By employing such routine techniques potential errors readily may be 
identified and the correct sequence then may be ascertained by targeting further 
sequencing effort, also of a routine nature, to the region containing the potential 

15 error. 

Even if all of the very rare sequencing errors in SEQ ID NOS: 1-391 were 
corrected, the resulting nucleotide sequences would still be at least 95% identical, 
nearly all would be at least 99% identical, and the great majority would be at least 
99.9% identical to the nucleotide sequences of SEQ ID NOS: 1-391. 

20 As discussed elsewhere herein, polynucleotides of the present invention 

readily may be obtained by routine application of well known and standard 
procedures for cloning and sequencing DNA. Detailed methods for obtaining 
libraries and for sequencing are provided below, for instance. A wide variety of 
Streptococcus pneumoniae strains that can be used to prepare S. pneumoniae 

25 genomic DNA for cloning and for obtaining polynucleotides of the present 
invention are available to the public from recognized depository institutions, such 
as the American Type Culture Collection ( ATCC ). While the present invention is 
enabled by the sequences and other information herein disclosed, the S. 
pneumoniae strain that provided the DNA of the present Sequence Listing, Strain 

30 7/87 14.8.91, has been deposited in the ATCC, as a convenience to those of skill 
in the art. As a further convenience, a library of S. pneumoniae genomic DNA, 
derived from the same strain, also has been deposited in the ATCC. The S. 
pneumoniae strain was deposited on October 10, 1996, and was given Deposit No. 
55840, and the cDNA library was deposited on October 11, 1996 and was given 

35 Deposit No. 97755. The genomic fragments in the library are 15 to 20 kb 
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fragments generated by partial Sau3Al digestion and they are inserted into the 
BamHI site in the well-known lambda-derived vector lambda DASH II (Stratagene, 
La Jolla, CA). The provision of the deposits is not a waiver of any rights of the 
inventors or their assignees in the present subject matter. 
5 The nucleotide sequences of the genomes from different strains of 

Streptococcus pneumoniae differ somewhat. However, the nucleotide sequences 
of the genomes of all Streptococcus pneumoniae strains will be at least 95% 
identical, in corresponding part, to the nucleotide sequences provided in SEQ ID 
NOS: 1-391. Nearly all will be at least 99% identical and the great majority will be 

10 99.9% identical. 

Thus, the present invention further provides nucleotide sequences which 
are at least 95%, preferably 99% and most preferably 99.9% identical to the 
nucleotide sequences of SEQ ID NOS: 1-391, in a form which can be readily used, 
analyzed and interpreted by the skilled artisan. 

15 Methods for determining whether a nucleotide sequence is at least 95%, at 

least 99% or at least 99.9% identical to the nucleotide sequences of SEQ ID 
NOS: 1-391 are routine and readily available to the skilled artisan. For example, the 
well known fasta algorithm described in Pearson and Lipman, Proc. Natl. Acad. 
Sci. USA 85: 2444 (1988) can be used to generate the percent identity of nucleotide 

20 sequences. The BLASTN program also can be used to generate an identity score 
of polynucleotides compared to one another. 

COMPUTER RELATED EMBODIMENTS 

The nucleotide sequences provided in SEQ ID NOS: 1-391, a representative 
25 fragment thereof, or a nucleotide sequence at least 95%, preferably at least 99% 
and most preferably at least 99.9% identical to a polynucleotide sequence of SEQ 
ID NOS: 1-391 may be "provided" in a variety of mediums to facilitate use thereof. 
As used herein, provided refers to a manufacture, other than an isolated nucleic 
acid molecule, which contains a nucleotide sequence of the present invention; i.e., 
30 a nucleotide sequence provided in SEQ ID NOS: 1-391, a representative fragment 
thereof, or a nucleotide sequence at least 95%, preferably at least 99% and most 
preferably at least 99.9% identical to a polynucleotide of SEQ ID NOS: 1-391. 
Such a manufacture provides a large portion of the Streptococcus pneumoniae 
genome and parts thereof {e.g., a Streptococcus pneumoniae open reading frame 
35 (ORF)) in a form which allows a skilled artisan to examine the manufacture using 
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means not directly applicable to examining the Streptococcus pneumoniae genome 
or a subset thereof as it exists in nature or in purified form. 

In one application of this embodiment, a nucleotide sequence of the present 
invention can be recorded on computer readable media. As used herein, "computer 
5 readable media" refers to any medium which can be read and accessed directly by a 
computer. Such media include, but are not limited to: magnetic storage media, 
such as floppy discs, hard disc storage medium, and magnetic tape; optical storage 
media such as CD- ROM; electrical storage media such as RAM and ROM; and 
hybrids of these categories, such as magnetic/optical storage media. A skilled 

in artisan can readily appreciate how any of the presently known computer readable 
mediums can be used to create a manufacture comprising computer readable 
medium having recorded thereon a nucleotide sequence of the present invention. 
Likewise, it will be clear to those of skill how additional computer readable media 
that may be developed also can be used to create analogous manufactures having 

1 5 recorded thereon a nucleotide sequence of the present invention. 

As used herein, "recorded" refers to a process for storing information on 
computer readable medium. A skilled artisan can readily adopt any of the presently 
know methods for recording information on computer readable medium to generate 
manufactures comprising the nucleotide sequence information of the present 

20 invention. A variety of data storage structures are available to a skilled artisan 
for creating a computer readable medium having recorded thereon a nucleotide 
sequence of the present invention. The choice of the data storage structure will 
generally be based on the means chosen to access the stored information. In 
addition, a variety of data processor programs and formats can be used to store the 

25 nucleotide sequence information of the present invention on computer readable 
medium. The sequence information can be represented in a word processing text 
file, formatted in commercially- available software such as WordPerfect and 
Microsoft Word, or represented in the form of an ASCII file, stored in a database 
application, such as DB2, Sybase, Oracle, or the like. A skilled artisan can readily 

30 adapt any number of data-processor structuring formats (e.g., text file or database) 
in order to obtain computer readable medium having recorded thereon the 
nucleotide sequence information of the present invention. 

Computer software is publicly available which allows a skilled artisan to 
access sequence information provided in a computer readable medium. Thus, by 

35 providing in computer readable form the nucleotide sequences of SEQ ID NOS: 1- 
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391, a representative fragment thereof, or a nucleotide sequence at least 95%, 
preferably at least 99% and most preferably at least 99.9% identical to a sequence 
of SEQ ID NOS: 1-391 the present invention enables the skilled artisan routinely to 
access the provided sequence information for a wide variety of purposes. 
5 The examples which follow demonstrate how software which implements 

the BLAST (Altschul et al., J. Mol. Biol. 275:403-410 (1990)) and BLAZE 
(Brutlag etai, Comp. Chem. 77:203-207 (1993)) search algorithms on a Sybase 
system was used to identify open reading frames (ORFs) within the Streptococcus 
pneumoniae genome which contain homology to ORFs or proteins from both 

1 0 Streptococcus pneumoniae and from other organisms. Among the ORFs discussed 
herein are protein encoding fragments of the Streptococcus pneumoniae genome 
useful in producing commercially important proteins, such as enzymes used in 
fermentation reactions and in the production of commercially useful metabolites. 

The present invention further provides systems, particularly computer- 

15 based systems, which contain the sequence information described herein. Such 
systems are designed to identify, among other things, commercially important 
fragments of the Streptococcus pneumoniae genome. 

As used herein, "a computer-based system" refers to the hardware means, 
software means, and data storage means used to analyze the nucleotide sequence 

20 information of the present invention. The minimum hardware means of the 
computer-based systems of the present invention comprises a central processing 
unit (CPU), input means, output means, and data storage means. A skilled artisan 
can readily appreciate that any one of the currently available computer-based 
systems are suitable for use in the present invention. 

25 As stated above, the computer-based systems of the present invention 

comprise a data storage means having stored therein a nucleotide sequence of the 
present invention and the necessary hardware means and software means for 
supporting and implementing a search means. 

As used herein, "data storage means" refers to memory which can store 

30 nucleotide sequence information of the present invention, or a memory access 
means which can access manufactures having recorded thereon the nucleotide 
sequence information of the present invention. 

As used herein, "search means" refers to one or more programs which are 
implemented on the computer-based system to compare a target sequence or target 

35 structural motif with the sequence information stored within the data storage 
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means. Search means are used to identify fragments or regions of the present 
genomic sequences which match a particular target sequence or target motif. A 
variety of known algorithms are disclosed publicly and a variety of commercially 
available software for conducting search means are and can be used in the 
5 computer-based systems of the present invention. Examples of such software 
includes, but is not limited to, MacPattern (EMBL), BLASTN and BLASTX 
(NCBIA). A skilled artisan can readily recognize that any one of the available 
algorithms or implementing software packages for conducting homology searches 
can be adapted for use in the present computer-based systems. 

10 As used herein, a "target sequence" can be any DNA or amino acid 

sequence of six or more nucleotides or two or more amino acids. A skilled artisan 
can readily recognize that the longer a target sequence is, the less likely a target 
sequence will be present as a random occurrence in the database. The most 
preferred sequence length of a target sequence is from about 10 to 100 amino acids 

15 or from about 30 to 300 nucleotide residues. However, it is well recognized that 
searches for commercially important fragments, such as sequence fragments 
involved in gene expression and protein processing, may be of shorter length. 

As used herein, "a target structural motif," or "target motif," refers to any 
rationally selected sequence or combination of sequences in which the sequence(s) 

20 are chosen based on a three-dimensional configuration which is formed upon the 
folding of the target motif. There are a variety of target motifs known in the art. 
Protein target motifs include, but are not limited to, enzymic active sites and signal 
sequences. Nucleic acid target motifs include, but are not limited to, promoter 
sequences, hairpin structures and inducible expression elements (protein binding 

25 sequences). 

A variety of structural formats for the input and output means can be used 
to input and output the information in the computer-based systems of the present 
invention. A preferred format for an output means ranks fragments of the 
Streptococcus pneumoniae genomic sequences possessing varying degrees of 

30 homology to the target sequence or target motif. Such presentation provides a 
skilled artisan with a ranking of sequences which contain various amounts of the 
target sequence or target motif and identifies the degree of homology contained in 
the identified fragment. 

A variety of comparing means can be used to compare a target sequence or 

35 target motif with the data storage means to identify sequence fragments of the 
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Streptococcus pneumoniae genome. In the present examples, implementing 
software which implement the BLAST and BLAZE algorithms, described in 
Altschul et al., J. Mol. Biol. 215: 403-410 (1990), is used to identify open reading 
frames within the Streptococcus pneumoniae genome. A skilled artisan can readily 
5 recognize that any one of the publicly available homology search programs can be 
used as the search means for the computer-based systems of the present invention. 
Of course, suitable proprietary systems that may be known to those of skill also 
may be employed in this regard. 

Figure 1 provides a block diagram of a computer system illustrative of 

10 embodiments of this aspect of present invention. The computer system 102 
includes a processor 106 connected to a bus 104. Also connected to the bus 104 
are a main memory 108 (preferably implemented as random access memory, RAM) 
and a variety of secondary storage devices 1 10, such as a hard drive 112 and a 
removable medium storage device 1 14. The removable medium storage device 1 14 

1 5 may represent, for example, a floppy disk drive, a CD-ROM drive, a magnetic tape 
drive, etc. A removable storage medium 116 (such as a floppy disk, a compact 
disk, a magnetic tape, etc.) containing control logic and/or data recorded therein 
may be inserted into the removable medium storage device 114. The computer 
system 102 includes appropriate software for reading the control logic and/or the 

20 data from the removable medium storage device 1 14, once it is inserted into the 
removable medium storage device 1 14. 

A nucleotide sequence of the present invention may be stored in a well 
known manner in the main memory 108, any of the secondary storage devices 1 10, 
and/or a removable storage medium 116. During execution, software for accessing 

25 and processing the genomic sequence (such as search tools, comparing tools, etc.) 
reside in main memory 108, in accordance with the requirements and operating 
parameters of the operating system, the hardware system and the software program 
or programs. 
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BIOCHEMICAL EMBODIMENTS 

Other embodiments of the present invention are directed to isolated 
fragments of the Streptococcus pneumoniae genome. The fragments of the 
5 Streptococcus pneumoniae genome of the present invention include, but are not 
limited to fragments which encode peptides and polypeptides, hereinafter open 
reading frames (ORFs), fragments which modulate the expression of an operably 
linked ORF, hereinafter expression modulating fragments (EMFs) and fragments 
which can be used to diagnose the presence of Streptococcus pneumoniae in a 

10 sample, hereinafter diagnostic fragments (DFs). 

As used herein, an "isolated nucleic acid molecule" or an "isolated fragment 
of the Streptococcus pneumoniae genome" refers to a nucleic acid molecule 
possessing a specific nucleotide sequence which has been subjected to purification 
means to reduce, from the composition, the number of compounds which arc 

15 normally associated with the composition. Particularly, the term refers to the 
nucleic acid molecules having the sequences set out in SEQ ID NOS: 1-391, to 
representative fragments thereof as described above, to polynucleotides at least 
95%, preferably at least 99% and especially preferably at least 99.9% identical in 
sequence thereto, also as set out above. 

20 A variety of purification means can be used to generate the isolated 

fragments of the present invention. These include, but are not limited to methods 
which separate constituents of a solution based on charge, solubility, or size. 

In one embodiment. Streptococcus pneumoniae. DNA can be enzymatically 
sheared to produce fragments of 15-20 kb in length. These fragments can then be 

25 used to generate a Streptococcus pneumoniae library by inserting them into lambda 
clones as described in the Examples below. Primers flanking, for example, an 
ORF, such as those enumerated in Tables 1-3 can then be generated using 
nucleotide sequence information provided in SEQ ID NOS: 1-391. Well known 
and routine techniques of PCR cloning then can be used to isolate the ORF from 

30 the lambda DNA library or Streptococcus pneumoniae genomic DNA. Thus, given 
the availability of SEQ ID NOS: 1-391, the information in Tables 1, 2 and 3, and 
the information that may be obtained readily by analysis of the sequences of SEQ 
ID NOS: 1-391 using methods set out above, those of skill will be enabled by the 
present disclosure to isolate any ORF-containing or other nucleic acid fragment of 

35 the present invention. 
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The isolated nucleic acid molecules of the present invention include, but are 
not limited to single stranded and double stranded DNA, and single stranded RNA. 

As used herein, an "open reading frame," ORF, means a series of triplets 
coding for amino acids without any termination codons and is a sequence 
5 translatable into protein. 

Tables 1, 2, and 3 list ORFs in the Streptococcus pneumoniae genomic 
contigs of the present invention that were identified as putative coding regions by 
the GeneMark software using organism-specific second-order Markov probability 
transition matrices. It will be appreciated that other criteria can be used, in 
10 accordance with well known analytical methods, such as those discussed herein, to 
generate more inclusive, more restrictive, or more selective lists. 

Table 1 sets out ORFs in the Streptococcus pneumoniae contigs of the 
present invention that over a continuous region of at least 50 bases are 95% or 
more identical (by BLAST analysis) to a nucleotide sequence available through 
15 GenBank in October, 1997. 

Table 2 sets out ORFs in the Streptococcus pneumoniae contigs of the 
present invention that are not in Table 1 and match, with a BLASTP probability 
score of 0.01 or less, a polypeptide sequence available through GenBank in 
October, 1997. 

20 Table 3 sets out ORFs in the Streptococcus pneumoniae contigs of the 

present invention that do not match significantly, by BLASTP analysis, a 
polypeptide sequence available through GenBank in October, 1997. 

In each table, the first and second columns identify the ORF by, 
respectively, contig number and ORF number within the contig; the third column 

25 indicates the first nucleotide of the ORF (actually the first nucleotide of the stop 
codon immediately preceeding the ORF), counting from the 5' end of the contig 
strand; and the fourth column, "stop (nt)" indicates the last nucleotide of the stop 
codon defining the 3 'end of the ORF. 

In Tables 1 and 2, column five, lists the Reference for the closest 

30 matching sequence available through GenBank. These reference numbers are the 
databases entry numbers commonly used by those of skill in the art, who will be 
familiar with their denominators. Descriptions of the nomenclature are available 
from the National Center for Biotechnology Information. Column six in Tables 1 
and 2 provides the gene name of the matching sequence; column seven provides 

35 the BLAST identity score and column eight the BLAST similarity score from the 
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comparison of the ORF and the homologous gene; and column nine indicates the 
length in nucleotides of the highest scoring segment pair identified by the BLAST 
identity analysis. 

Each ORF described in the tables is defined by "start (nt)" (5') and "stop 
5 (nt)" (3') nucleotide position numbers. These position numbers refer to the 
boundaries of each ORF and provide orientation with respect to whether the 
forward or reverse strand is the coding strand and which reading frame the coding 
sequence is contained. The "start" position is the first nucleotide of the triplet 
encoding a stop codon just 5' to the ORF and the "stop" position is the last 

10 nucleotide of the triplet encoding the next in-frame stop codon (i.e., the stop codon 
at the 3' end of the ORF). Those of ordinary skill in the art appreciate that 
preferred fragments within each ORF described in the table include fragments of 
each ORF which include the entire sequence from the delineated "start" and "stop" 
positions excepting the first and last three nucleotides since these encode stop 

15 codons. Thus, polynucleotides set out as ORFs in the tables but lacking the three 
(3) 5' nucleotides and the three (3) 3' nucleotides are encompassed by the present 
invention. Those of skill also appreciate that particularly preferred are fragments 
within each ORF that are polynucleotide fragments comprising polypeptide coding 
sequence. As defined herein, "coding sequence" includes the fragment within an 

20 ORF beginning at the first in-frame ATG (triplet encoding methionine) and ending 
with the last nucleotide prior to the triplet encoding the 3' stop codon. Preferred 
are fragments comprising the entire coding sequence and fragments comprising the 
entire coding sequence, excepting the coding sequence for the N-terminal 
methionine. Those of skill appreciate that the N-terminal methionine is often 

25 removed during post-translational processing and that polynucleotides lacking the 
ATG can be used to facilitate production of N-termainal fusion proteins which may 
be benefical in the production or use of genetically engineered proteins. Of course, 
due to the degeneracy of the genetic code many polynucleotides can encode a given 
polypeptide. Thus, the invention further includes polynucleotides comprising a 

30 nucleotide sequence encoding a polypeptide sequence itself encoded by the coding 
sequence within an ORF described in Tables 1-3 herein. Further, polynucleotides 
at least 95%, preferably at least 99% and especially preferably at least 99.9% 
identical in sequence to the foregoing polynucleotides, are contemplated by the 
present invention. 
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Polypeptides encoded by polynucleotides described above and elsewhere 
herein are also provided by the present invention as are polypeptide comprising a 
an amino acid sequence at least about 95%, preferably at least 97% and even more 
preferably 99% identical to the amino acid sequence of a polypeptide encoded by an 
5 ORF shown in Tables 1-3. These polypeptides may or may not comprise an N- 
terminal methionine. 

The concepts of percent identity and percent similarity of two polypeptide 
sequences is well understood in the art. For example, two polypeptides 10 amino 
acids in length which differ at three amino acid positions (e.g., at positions 1, 3 
10 and 5) are said to have a percent identity of 70%. However, the same two 
polypeptides would be deemed to have a percent similarity of 80% if, for example 
at position 5, the amino acids moieties, although not identical, were "similar" (i.e., 
possessed similar biochemical characteristics). Many programs for analysis of 
nucleotide or amino acid sequence similarity, such as fasta and BLAST specifically 

15 list percent identity of a matching region as an output parameter. Thus, for 
instance, Tables 1 and 2 herein enumerate the percent identity of the highest 
scoring segment pair in each ORF and its listed relative. Further details 
concerning the algorithms and criteria used for homology searches are provided 
below and are described in the pertinent literature highlighted by the citations 

20 provided below. 

It will be appreciated that other criteria can be used to generate more 
inclusive and more exclusive listings of the types set out in the tables. As those of 
skill will appreciate, narrow and broad searches both are useful. Thus, a skilled 
artisan can readily identify ORFs in contigs of the Streptococcus pneumoniae 

25 genome other than those listed in Tables 1-3, such as ORFs which are overlapping 
or encoded by the opposite strand of an identified ORF in addition to those 
ascertainable using the computer-based systems of the present invention. 

As used herein, an "expression modulating fragment," EMF, means a 
series of nucleotide molecules which modulates the expression of an operably 

30 linked ORF or EMF. 
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As used herein, a sequence is said to "modulate the expression of an 
operably linked sequence" when the expression of the sequence is altered by the 
presence of the EMF. EMFs include, but are not limited to, promoters, and 
promoter modulating sequences (inducible elements). One class of EMFs are 
5 fragments which induce the expression or an operably linked ORF in response to a 
specific regulatory factor or physiological event. 

EMF sequences can be identified within the contigs of the Streptococcus 
pneumoniae genome by their proximity to the ORFs provided in Tables 1-3. An 
intergenic segment, or a fragment of the intergenic segment, from about 10 to 200 

10 nucleotides in length, taken from any one of the ORFs of Tables 1-3 will modulate 
the expression of an operably linked ORF in a fashion similar to that found with the 
naturally linked ORF sequence. As used herein, an "intergenic segment" refers to 
fragments of the Streptococcus pneumoniae genome which are between two 
ORF(s) herein described. EMFs also can be identified using known EMFs as a 

15 target sequence or target motif in the computer-based systems of the present 
invention. Further, the two methods can be combined and used together. 

The presence and activity of an EMF can be confirmed using an EMF trap 
vector. An EMF trap vector contains a cloning site linked to a marker sequence. A 
marker sequence encodes an identifiable phenotype, such as antibiotic resistance or 

20 a complementing nutrition auxotrophic factor, which can be identified or assayed 
when the EMF trap vector is placed within an appropriate host under appropriate 
conditions. As described above, a EMF will modulate the expression of an 
operably linked marker sequence. A more detailed discussion of various marker 
sequences is provided below. A sequence which is suspected as being an EMF is 

25 cloned in all three reading frames in one or more restriction sites upstream from the 
marker sequence in the EMF trap vector. The vector is then transformed into an 
appropriate host using known procedures and the phenotype of the transformed 
host in examined under appropriate conditions. As described above, an EMF will 
modulate the expression of an operably linked marker sequence. 

30 As used herein, a "diagnostic fragment," DF, means a series of nucleotide 

molecules which selectively hybridize to Streptococcus pneumoniae sequences. 
DFs can be readily identified by identifying unique sequences within contigs of the 
Streptococcus pneumoniae genome, such as by using well-known computer 
analysis software, and by generating and testing probes or amplification primers 
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consisting of the DF sequence in an appropriate diagnostic format which 
determines amplification or hybridization selectivity. 

The sequences falling within the scope of the present invention are not 
limited to the specific sequences herein described, but also include allelic and 
5 species variations thereof. Allelic and species variations can be routinely 
determined by comparing the sequences provided in SEQ ID NOS: 1-391, a 
representative fragment thereof, or a nucleotide sequence at least 95%, preferrably 
at least 99% and most at least preferably 99.9% identical to SEQ ID NOS: 1-391, 
with a sequence from another isolate of the same species. Furthermore, to 
10 accommodate codon variability, the invention includes nucleic acid molecules 
coding for the same amino acid sequences as do the specific ORFs disclosed 
herein. In other words, in the coding region of an ORF, substitution of one codon 
for another which encodes the same amino acid is expressly contemplated. Any 
specific sequence disclosed herein can be readily screened for errors by 

15 resequencing a particular fragment, such as an ORF, in both directions (i.e., 
sequence both strands). Alternatively, error screening can be performed by 
sequencing corresponding polynucleotides of Streptococcus pneumoniae origin 
isolated by using part or all of the fragments in question as a probe or primer. 

Preferred DFs of the present invention comprise at least about 17, 

20 preferrably at least about 20, and more preferrably at least about 50 contiguous 
nucleotides within an ORF set out in Tables 1-3. Most highly preferred DFs 
specifically hybridize to a polynucleotide containing the sequence of the ORF from 
which they are derived. Specific hybridization occurs even under stringent 
conditions defined elsewhere herein. 

25 Each of the ORFs of the Streptococcus pneumoniae genome disclosed in 

Tables 1, 2 and 3, and the EMFs found 5' to the ORFs, can be used as 
polynucleotide reagents in numerous ways. For example, the sequences can be 
used as diagnostic probes or diagnostic amplification primers to detect the presence 
of a specific microbe in a sample, particularly Streptococcus pneumoniae. 

30 Especially preferred in this regard are ORFs such as those of Table 3, which do not 
match previously characterized sequences from other organisms and thus are most 
likely to be highly selective for Streptococcus pneumoniae. Also particularly 
preferred are ORFs that can be used to distinguish between strains of Streptococcus 
pneumoniae, particularly those that distinguish medically important strain, such as 

35 drug-resistant strains. 
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In addition, the fragments of the present invention, as broadly described, 
can be used to control gene expression through triple helix formation or antisense 
DNA or RNA, both of which methods are based on the binding of a polynucleotide 
sequence to DNA or RNA. Triple helix-formation optimally results in a shut-off of 
5 RNA transcription from DNA, while antisense RNA hybridization blocks 
translation of an mRNA molecule into polypeptide. Information from the 
sequences of the present invention can be used to design antisense and triple helix- 
forming oligonucleotides. Polynucleotides suitable for use in these methods are 
usually 20 to 40 bases in length and are designed to be complementary to a region 

10 of the gene involved in transcription, for triple-helix formation, or to the mRNA 
itself, for antisense inhibition. Both techniques have been demonstrated to be 
effective in model systems, and the requisite techniques are well known and 
involve routine procedures. Triple helix techniques arc discussed in, for example, 
Lee et al., Nucl. Acids Res. 6:3073 (1979); Cooney et al, Science 247:456 

15 (1988); and Dervan et al., Science 257:1360 (1991). Antisense techniques in 
general are discussed in, for instance, Okano, J. Neurochem. 56:560 (1991) and 
Oligodeoxynucleotid.es as Antisense Inhibitors of Gene Expression, CRC Press, 
Boca Raton, FL (1988)). 

The present invention further provides recombinant constructs comprising 

20 one or more fragments of the Streptococcus pneumoniae genomic fragments and 
contigs of the present invention. Certain preferred recombinant constructs of the 
present invention comprise a vector, such as a plasmid or viral vector, into which a 
fragment of the Streptococcus pneumoniae genome has been inserted, in a forward 
or reverse orientation. In the case of a vector comprising one of the ORFs of the 

25 present invention, the vector may further comprise regulatory sequences, including 
for example, a promoter, operably linked to the ORF. For vectors comprising the 
EMFs of the present invention, the vector may further comprise a marker sequence 
or heterologous ORF operably linked to the EMF. 

Large numbers of suitable vectors and promoters are known to those of 

30 skill in the art and are commercially available for generating the recombinant 
constructs of the present invention. The following vectors are provided by way of 
example. Useful bacterial vectors include phagescript, PsiX174, pBluescript SK, 
pBS KS, pNH8a, pNH16a, pNH18a, P NH46a (available from Stratagene); 
P Trc99A, pKK223-3, pKK233-3, pDR540, pRIT5 (available from Pharmacia). 

35 Useful eukaryotic vectors include pWLneo, pSV2cat, pOG44, pXTl, pSG 
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(available from Stratagene) pSVK3, pBPV, pMSG, pSVL (available from 
Pharmacia). 

Promoter regions can be selected from any desired gene using CAT 
(chloramphenicol transferase) vectors or other vectors with selectable markers. 
5 Two appropriate vectors are pKK232-8 and pCM7. Particular named bacterial 
promoters include lad, lacZ, T3, T7, gpt, lambda PR, and trc. Eukaryotic 
promoters include CMV immediate early, HSV thymidine kinase, early and late 
SV40, LTRs from retrovirus, and mouse metallothionein- I. Selection of the 
appropriate vector and promoter is well within the level of ordinary skill in the art. 

10 The present invention further provides host cells containing any one of the 

isolated fragments of the Streptococcus pneumoniae genomic fragments and 
contigs of the present invention, wherein the fragment has been introduced into the 
host cell using known methods. The host cell can be a higher eukaryotic host 
cell, such as a mammalian cell, a lower eukaryotic host cell, such as a yeast cell, or 

15 a procaryotic cell, such as a bacterial cell. 

A polynucleotide of the present invention, such as a recombinant construct 
comprising an ORF of the present invention, may be introduced into the host by a 
variety of well established techniques that are standard in the art, such as calcium 
phosphate transfection, DEAE, dextran mediated transfection and electroporation, 

20 which are described in, for instance, Davis, L. et al, BASIC METHODS IN 
MOLECULAR BIOLOGY (1986). 

A host cell containing one of the fragments of the Streptococcus 
pneumoniae genomic fragments and contigs of the present invention, can be used 
in conventional manners to produce the gene product encoded by the isolated 

25 fragment (in the case of an ORF) or can be used to produce a heterologous protein 
under the control of the EMF. The present invention further provides 

isolated polypeptides encoded by the nucleic acid fragments of the present 
invention or by degenerate variants of the nucleic acid fragments of the present 
invention. By "degenerate variant" is intended nucleotide fragments which differ 

30 from a nucleic acid fragment of the present invention (e.g., an ORF) by nucleotide 
sequence but, due to the degeneracy of the Genetic Code, encode an identical 
polypeptide sequence. 

Preferred nucleic acid fragments of the present invention are the ORFs and 
subfragments thereof depicted in Tables 2 and 3 which encode proteins. 
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A variety of methodologies known in the art can be utilized to obtain any 
one of the isolated polypeptides or proteins of the present invention. At the 
simplest level, the amino acid sequence can be synthesized using commercially 
available peptide synthesizers. This is particularly useful in producing small 
5 peptides and fragments of larger polypeptides. Such short fragments as may be 
obtained most readily by synthesis are useful, for example, in generating antibodies 
against the native polypeptide, as discussed further below. 

In an alternative method, the polypeptide or protein is purified from 
bacterial cells which naturally produce the polypeptide or protein. One skilled in 

10 the art can readily employ well-known methods for isolating polypeptides and 
proteins to isolate and purify polypeptides or proteins of the present invention 
produced naturally by a bacterial strain, or by other methods. Methods for 
isolation and purification that can be employed in this regard include, but are not 
limited to, immunochromatography, HPLC, size-exclusion chromatography, ion- 

15 exchange chromatography, and immuno-affinity chromatography. 

The polypeptides and proteins of the present invention also can be purified 
from cells which have been altered to express the desired polypeptide or protein. 
As used herein, a cell is said to be altered to express a desired polypeptide or 
protein when the cell, through genetic manipulation, is made to produce a 

20 polypeptide or protein which it normally does not produce or which the cell 
normally produces at a lower level. Those skilled in the art can readily adapt 
procedures for introducing and expressing either recombinant or synthetic 
sequences into eukaryotic or prokaryotic cells in order to generate a cell which 
produces one of the polypeptides or proteins of the present invention. 

25 Any host/vector system can be used to express one or more of the ORFs of 

the present invention. These include, but are not limited to, eukaryotic hosts such 
as HeLa cells, CV-1 cell, COS cells, and Sf9 cells, as well as prokaryotic host 
such as E. culi and B. subtilis. The most preferred cells are those which do not 
normally express the particular polypeptide or protein or which expresses the 

30 polypeptide or protein at low natural level. 
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"Recombinant," as used herein, means that a polypeptide or protein is 
derived from recombinant {e.g., microbial or mammalian) expression systems. 
"Microbial" refers to recombinant polypeptides or proteins made in bacterial or 
fungal (e.g., yeast) expression systems. As a product, "recombinant 
5 microbial"defines a polypeptide or protein essentially free of native endogenous 
substances and unaccompanied by associated native glycosylation. Polypeptides or 
proteins expressed in most bacterial cultures, e.g., E. coli, will be free of 
glycosylation modifications; polypeptides or proteins expressed in yeast will have a 
glycosylation pattern different from that expressed in mammalian cells. 

10 "Nucleotide sequence" refers to a heteropolymer of deoxy ribonucleotides. 

Generally, DNA segments encoding the polypeptides and proteins provided by this 
invention are assembled from fragments of the Streptococcus pneumoniae genome 
and short oligonucleotide linkers, or from a series of oligonucleotides, to provide a 
synthetic gene which is capable of being expressed in a recombinant transcriptional 

1 5 unit comprising regulatory elements derived from a microbial or viral operon. 

Recombinant expression vehicle or vector" refers to a plasmid or phage or 
virus or vector, for expressing a polypeptide from a DNA (RNA) sequence. The 
expression vehicle can comprise a transcriptional unit comprising an assembly of 
(1) a genetic regulatory elements necessary for gene expression in the host, 

20 including elements required to initiate and maintain transcription at a level sufficient 
for suitable expression of the desired polypeptide, including, for example, 
promoters and, where necessary, an enhancer and a polyadenylation signal; (2) a 
structural or coding sequence which is transcribed into mRNA ana translated into 
protein, and (3) appropriate signals to initiate translation at the beginning of the 

25 desired coding region and terminate translation at its end. Structural units intended 
for use in yeast or eukaryotic expression systems preferably include a leader 
sequence enabling extracellular secretion of translated protein by a host cell. 
Alternatively, where recombinant protein is expressed without a leader or transport 
sequence, it may include an N-terminal methionine residue. This residue may or 

30 may not be subsequently cleaved from the expressed recombinant protein to 
provide a final product. 

"Recombinant expression system" means host cells which have stably 
integrated a recombinant transcriptional unit into chromosomal DNA or carry the 
recombinant transcriptional unit extra chromosomally. The cells can be prokaryotic 

35 or eukaryotic. Recombinant expression systems as defined herein will express 
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heterologous polypeptides or proteins upon induction of the regulatory elements 
linked to the DNA segment or synthetic gene to be expressed. 

Mature proteins can be expressed in mammalian cells, yeast, bacteria, or 
other cells under the control of appropriate promoters. Cell-free translation 
5 systems can also be employed to produce such proteins using RNAs derived from 
the DNA constructs of the present invention. Appropriate cloning and expression 
vectors for use with prokaryotic and eukaryotic hosts are described in Sambrook et 
al, Molecular Cloning: A Laboratory Manual, 2 nd Edition, Cold Spring Harbor 
Laboratory Press, Cold Spring Harbor, New York (1989), the disclosure of which 

10 is hereby incorporated by reference in its entirety. 

Generally, recombinant expression vectors will include origins of 
replication and selectable markers permitting transformation of the host cell, e.g., 
the ampicillin resistance gene of E. coli and S. cerevisiae TRP1 gene, and a 
promoter derived from a highly expressed gene to direct transcription of a 

15 downstream structural sequence. Such promoters can be derived from operons 
encoding glycolytic enzymes such as 3- phosphoglycerate kinase (PGK), alpha- 
factor, acid phosphatase, or heat shock proteins, among others. The heterologous 
structural sequence is assembled in appropriate phase with translation initiation and 
termination sequences, and preferably, a leader sequence capable of directing 

20 secretion of translated protein into the periplasmic space or extracellular medium. 
Optionally, the heterologous sequence can encode a fusion protein including an N- 
terminal identification peptide imparting desired characteristics, e.g., stabilization 
or simplified purification of expressed recombinant product. 

Useful expression vectors for bacterial use are constructed by inserting a 

25 structural DNA sequence encoding a desired protein together with suitable 
translation initiation and termination signals in operable reading phase with a 
functional promoter. The vector will comprise one or more phenotypic selectable 
markers and an origin of replication to ensure maintenance of the vector and, when 
desirable, provide amplification within the host. 

30 Suitable prokaryotic hosts for transformation include strains of E. coli, B. 

subtilis, Salmonella typhimurium and various species within the genera 
Pseudomonas and Streptomyces. Others may, also be employed as a matter of 
choice. 

As a representative but non-limiting example, useful expression vectors for 
35 bacterial use can comprise a selectable marker and bacterial origin of replication 
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derived from commercially available plasmids comprising genetic elements of the 
well known cloning vector pBR322 (ATCC 37017). Such commercial vectors 
include, for example, pKK223-3 (available form Pharmacia Fine Chemicals, 
Uppsala, Sweden) and GEM 1 (available from Promega Biotec, Madison, WI, 
5 USA). These pBR322 "backbone" sections are combined with an appropriate 
promoter and the structural sequence to be expressed. 

Following transformation of a suitable host strain and growth of the host 
strain to an appropriate cell density, the selected promoter, where it is inducible, is 
derepressed or induced by appropriate means (e.g., temperature shift or chemical 

10 induction) and cells are cultured for an additional period to provide for expression 
of the induced gene product. Thereafter cells arc typically harvested, generally by 
centrifugation, disrupted to release expressed protein, generally by physical or 
chemical means, and the resulting crude extract is retained for further purification. 

Various mammalian cell culture systems can also be employed to express 

15 recombinant protein. Examples of mammalian expression systems include the 
COS-7 lines of monkey kidney fibroblasts, described in Gluzman, Cell 23:115 
(1981), and other cell lines capable of expressing a compatible vector, for example, 
the CI 27, 3T3, CHO, HeLa and BHK cell lines. 

Mammalian expression vectors will comprise an origin of replication, a 

20 suitable promoter and enhancer, and also any necessary ribosome binding sites, 
polyadenylation site, splice donor and acceptor sites, transcriptional termination 
sequences, and 5' flanking nontranscribed sequences. DNA sequences derived 
from the SV40 viral genome, for example, SV40 origin, early promoter, enhancer, 
splice, and polyadenylation sites may be used to provide the required 

25 nontranscribed genetic elements. 

Recombinant polypeptides and proteins produced in bacterial culture is 
usually isolated by initial extraction from cell pellets, followed by one or more 
salting-out, aqueous ion exchange or size exclusion chromatography steps. 
Microbial cells employed in expression of proteins can be disrupted by any 

30 convenient method, including freeze-thaw cycling, sonication, mechanical 
disruption, or use of cell lysing agents. Protein refolding steps can be used, as 
necessary, in completing configuration of the mature protein. Finally, high 
performance liquid chromatography (HPLC) can be employed for final purification 
steps. 
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The present invention further includes isolated polypeptides, proteins and 
nucleic acid molecules which are substantially equivalent to those herein described. 
As used herein, substantially equivalent can refer both to nucleic acid and amino 
acid sequences, for example a mutant sequence, that varies from a reference 

5 sequence by one or more substitutions, deletions, or additions, the net effect of 
which does not result in an adverse functional dissimilarity between reference and 
subject sequences. For purposes of the present invention, sequences having 
equivalent biological activity, and equivalent expression characteristics are 
considered substantially equivalent. For purposes of determining equivalence, 

10 truncation of the mature sequence should be disregarded. 

The invention further provides methods of obtaining homologs from other 
strains of Streptococcus pneumoniae, of the fragments of the Streptococcus 
pneumoniae genome of the present invention and homologs of the proteins encoded 
by the ORFs of the present invention. As used herein, a sequence or protein of 

15 Streptococcus pneumoniae is defined as a homolog of a fragment of the 
Streptococcus pneumoniae fragments or contigs or a protein encoded by one of the 
ORFs of the present invention, if it shares significant homology to one of the 
fragments of the Streptococcus pneumoniae genome of the present invention or a 
protein encoded by one of the ORFs of the present invention. Specifically, by 

20 using the sequence disclosed herein as a probe or as primers, and techniques such 
as PCR cloning and colony/plaque hybridization, one skilled in the art can obtain 
homologs. 

As used herein, two nucleic acid molecules or proteins are said to "share 
significant homology" if the two contain regions which possess greater than 85% 

25 sequence (amino acid or nucleic acid) homology. Preferred homologs in this 
regard are those with more than 90% homology. Especially preferred are those 
with 93% or more homology. Among especially preferred homologs those with 
95% or more homology are particularly preferred. Very particularly preferred 
among these are those with 97% and even more particularly preferred among those 

30 are homologs with 99% or more homology. The most preferred homologs among 
these are those with 99.9% homology or more. It will be understood that, among 
measures of homology, identity is particularly preferred in this regard. 

Region specific primers or probes derived from the nucleotide sequence 
provided in SEQ ID NOS: 1-391 or from a nucleotide sequence at least 95%, 

35 particularly at least 99%, especially at least 99.5% identical to a sequence of SEQ 



WO 98/18931 PCT7US97/19588 



ID NOS: 1-391 can be used to prime DNA synthesis and PCR amplification, as 
well as to identify colonies containing cloned DNA encoding a homolog. Methods 
suitable to this aspect of the present invention are well known and have been 
described in great detail in many publications such as, for example, Innis et al, 
5 PCR Protocols, Academic Press, San Diego, CA ( 1 990)). 

When using primers derived from SEQ ID NOS: 1-391 or from a nucleotide 
sequence having an aforementioned identity to a sequence of SEQ ID NOS: 1-391, 
one skilled in the art will recognize that by employing high stringency conditions 
(e.g., annealing at 50-60°C in 6X SSPC and 50% formamide, and washing at 50- 

10 65°C in 0.5X SSPC) only sequences which are greater than 75% homologous to 
the primer will be amplified. By employing lower stringency conditions (e.g., 
hybridizing at 35-37°C in 5X SSPC and 40-45% formamide, and washing at 42°C 
in 0.5X SSPC), sequences which are greater than 40-50% homologous to the 
primer will also be amplified. 

15 When using DNA probes derived from SEQ ID NOS: 1-391, or from a 

nucleotide sequence having an aforementioned identity to a sequence of SEQ ID 
NOS: 1-391, for colony/plaque hybridization, one skilled in the art will recognize 
that by employing high stringency conditions (e.g., hybridizing at 50- 65°C in 5X 
SSPC and 50% formamide, and washing at 50- 65°C in 0.5X SSPC), sequences 

20 having regions which are greater than 90% homologous to the probe can be 
obtained, and that by employing lower stringency conditions (e.g., hybridizing at 
35-37°C in 5X SSPC and 40-45% formamide, and washing at 42°C in 0.5X 
SSPC), sequences having regions which are greater than 35-45% homologous to 
the probe will be obtained. 

25 Any organism can be used as the source for homologs of the present 

invention so long as the organism naturally expresses such a protein or contains 
genes encoding the same. The most preferred organism for isolating homologs are 
bacteria which are closely related to Streptococcus pneumoniae. 

30 ILLUSTRATIVE USES OF COMPOSITIONS OF THE 

INVENTION 

Each ORF provided in Tables 1 and 2 is identified with a function by 
homology to a known gene or polypeptide. As a result, one skilled in the art can 
use the polypeptides of the present invention for commercial, therapeutic and 
35 industrial purposes consistent with the type of putative identification of the 



WO 98/18931 PCTYUS97/19588 



polypeptide. Such identifications permit one skilled in the art to use the 
Streptococcus pneumoniae ORFs in a manner similar to the known type of 
sequences for which the identification is made; for example, to ferment a particular 
sugar source or to produce a particular metabolite. A variety of reviews illustrative 
5 of this aspect of the invention are available, including the following reviews on the 
industrial use of enzymes, for example, BIOCHEMICAL ENGINEERING AND 
BIOTECHNOLOGY HANDBOOK, 2nd Ed., MacMillan Publications, Ltd. NY 
(1991) and BIOCATALYSTS IN ORGANIC SYNTHESES, Tramper et al, Eds., 
Elsevier Science Publishers, Amsterdam, The Netherlands (1985). A variety of 
10 exemplary uses that illustrate this and similar aspects of the present invention are 
discussed below. 

1. Biosynthetic Enzymes 

Open reading frames encoding proteins involved in mediating the catalytic 

15 reactions involved in intermediary and macromolecular metabolism, the 
biosynthesis of small molecules, cellular processes and other functions includes 
enzymes involved in the degradation of the intermediary products of metabolism, 
enzymes involved in central intermediary metabolism, enzymes involved in 
respiration, both aerobic and anaerobic, enzymes involved in fermentation, 

20 enzymes involved in ATP proton motor force conversion, enzymes involved in 
broad regulatory function, enzymes involved in amino acid synthesis, enzymes 
involved in nucleotide synthesis, enzymes involved in cofactor and vitamin 
synthesis, can be used for industrial biosynthesis. 

The various metabolic pathways present in Streptococcus pneumoniae can 

25 be identified based on absolute nutritional requirements as well as by examining the 
various enzymes identified in Table 1-3 and SEQ ID NOS: 1-391. 

Of particular interest are polypeptides involved in the degradation of 
intermediary metabolites as well as non-macromolecular metabolism. Such 
enzymes include amylases, glucose oxidases, and catalase. 

30 Proteolytic enzymes are another class of commercially important enzymes. 

Proteolytic enzymes find use in a number of industrial processes including the 
processing of flax and other vegetable fibers, in the extraction, clarification and 
depectinization of fruit juices, in the extraction of vegetables' oil and in the 
maceration of fruits and vegetables to give unicellular fruits. A detailed review of 

35 the proteolytic enzymes used in the food industry is provided in Rombouts et al, 
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Symbiosis 21:19 (1986) and Voragen et al. in Biocatalysts In Agricultural 
Biotechnology, Whitaker et al, Eds., American Chemical Society Symposium 
Series 389:93 (1989) . 

The metabolism of sugars is an important aspect of the primary metabolism 
5 of Streptococcus pneumoniae. Enzymes involved in the degradation of sugars, 
such as, particularly, glucose, galactose, fructose and xylose, can be used in 
industrial fermentation. Some of the important sugar transforming enzymes, from 
a commercial viewpoint, include sugar isomerases such as glucose isomerase. 
Other metabolic enzymes have found commercial use such as glucose oxidases 

10 which produces ketogulonic acid (KG A). KGA is an intermediate in the 
commercial production of ascorbic acid using the Reichstein's procedure, as 
described in Krueger et al., Biotechnology 6(A) , Rhine et al., Eds., Verlag Press, 
Weinheim, Germany (1984). 

Glucose oxidase (GOD) is commercially available and has been used in 

15 purified form as well as in an immobilized form for the deoxygenation of beer. 
See, for instance, Hartmeir et al., Biotechnology Letters 1:21 (1979). The most 
important application of GOD is the industrial scale fermentation of gluconic acid. 
Market for gluconic acids which are used in the detergent, textile, leather, 
photographic, pharmaceutical, food, feed and concrete industry, as described, for 

20 example, in Bigelis et al., beginning on page 357 in GENE MANIPULATIONS 
AND FUNGI; Benett et al., Eds., Academic Press, New York (1985). In addition 
to industrial applications, GOD has found applications in medicine for quantitative 
determination of glucose in body fluids recently in biotechnology for analyzing 
syrups from starch and cellulose hydrosylates. This application is described in 

25 Owusu et al. , Biochem. et Biophysica. Acta. 872:83 (1986), for instance. 

The main sweetener used in the world today is sugar which comes from 
sugar beets and sugar cane. In the field of industrial enzymes, the glucose 
isomerase process shows the largest expansion in the market today. Initially, 
soluble enzymes were used and later immobilized enzymes were developed 

30 (Krueger et al., Biotechnology, The Textbook of Industrial Microbiology, Sinauer 
Associated Incorporated, Sunderland, Massachusetts (1990)). Today, the use of 
glucose- produced high fructose syrups is by far the largest industrial business 
using immobilized enzymes. A review of the industrial use of these enzymes is 
provided by Jorgensen, Starch 40:307 (1988). 
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Proteinases, such as alkaline serine proteinases, are used as detergent 
additives and thus represent one of the largest volumes of microbial enzymes used 
in the industrial sector. Because of their industrial importance, there is a large body 
of published and unpublished information regarding the use of these enzymes in 
5 industrial processes. (See Faultman et al, Acid Proteases Structure Function and 
Biology, Tang, J., ed., Plenum Press, New York (1977) and Godfrey et al, 
Industrial Enzymes, MacMillan Publishers, Surrey, UK (1983) and Hepner et al, 
Report Industrial Enzymes by 1990, Hel Hepner & Associates, London (1986)). 

Another class of commercially usable proteins of the present invention are 
10 the microbial lipases, described by, for instance, Macrae et al, Philosophical 
Transactions of the Chiral Society of London 310:221 (1985) and Poserke, Journal 
of the American Oil Chemist Society 67.1758 (1984). A major use of lipases is in 
the fat and oil industry for the production of neutral glycendes using lipase 
catalyzed inter-esterification of readily available triglycerides. Application of 
15 lipases include the use as a detergent additive to facilitate the removal of fats from 
fabrics in the course of the washing procedures. 

The use of enzymes, and in particular microbial enzymes, as catalyst for 
key steps in the synthesis of complex organic molecules is gaining popularity at a 
great rate. One area of great interest is the preparation of chiral intermediates. 
20 Preparation of chiral intermediates is of interest to a wide range of synthetic 
chemists particularly those scientists involved with the preparation of new 
pharmaceuticals, agrochemicals, fragrances and flavors. (See Davies et al. Recent 
Advances in the Generation of Chiral Intermediates Using Enzymes, CRC Press, 
Boca Raton, Florida (1990)). The following reactions catalyzed by enzymes are of 
25 interest to organic chemists: hydrolysis of carboxylic acid esters, phosphate esters, 
amides and nitriles, esterification reactions, trans-esterification reactions, synthesis 
of amides, reduction of alkanones and oxoalkanates, oxidation of alcohols to 
carbonyl compounds, oxidation of sulfides to sulfoxides, and carbon bond forming 
reactions such as the aldol reaction. 
30 When considering the use of an enzyme encoded by one of the ORFs of the 

present invention for biotransformation and organic synthesis it is sometimes 
necessary to consider the respective advantages and disadvantages of using a 
microorganism as opposed to an isolated enzyme. Pros and cons of using a whole 
cell system on the one hand or an isolated partially purified enzyme on the other 
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hand, has been described in detail by Bud et al, Chemistry in Britain (1987), p. 
127. 

Amino transferases, enzymes involved in the biosynthesis and metabolism 
of amino acids, are useful in the catalytic production of amino acids. The 

5 advantages of using microbial based enzyme systems is that the amino transferase 
enzymes catalyze the stereo- selective synthesis of only L-amino acids and 
generally possess uniformly high catalytic rates. A description of the use of amino 
transferases for amino acid production is provided by Roselle-David, Methods of 
Enzymology 136:479 (1987). 

10 Another category of useful proteins encoded by the ORFs of the present 

invention include enzymes involved in nucleic acid synthesis, repair, and 
recombination. 

2. Generation of Antibodies 

15 As described here, the proteins of the present invention, as well as 

homologs thereof, can be used in a variety of procedures and methods known in 
the art which are currently applied to other proteins. The proteins of the present 
invention can further be used to generate an antibody which selectively binds the 
protein. Such antibodies can be either monoclonal or polyclonal antibodies, as well 

20 fragments of these antibodies, and humanized forms. 

The invention further provides antibodies which selectively bind to one of 
the proteins of the present invention and hybridomas which produce these 
antibodies. A hybridoma is an immortalized cell line which is capable of secreting 
a specific monoclonal antibody. 

25 In general, techniques for preparing polyclonal and monoclonal antibodies 

as well as hybridomas capable of producing the desired antibody are well known in 
the art (Campbell, A. M., Monoclonal Antibody Technology: Laboratory 
Techniques In Biochemistry And Molecular Biology, Elsevier Science Publishers, 
Amsterdam, The Netherlands (1984); St. Groth etal., J. Immunol. Methods 35: l- 

30 21 (1980), Kohler and Milstein, Nature 256:495-497 (1975)), the trioma 
technique, the human B-cell hybridoma technique (Kozbor et al, Immunology 
Today 4:12 (1983), pgs. 77-96 of Cole et al, in Monoclonal Antibodies And 
Cancer Therapy, Alan R. Liss, Inc. (1985)). Any animal (mouse, rabbit, 

etc.) which is known to produce antibodies can be immunized with the pseudogene 

35 polypeptide. Methods for immunization are well known in the art. Such methods 
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include subcutaneous or interperitoneal injection of the polypeptide. One skilled in 
the art will recognize that the amount of the protein encoded by the ORF of the 
present invention used for immunization will vary based on the animal which is 
immunized, the antigenicity of the peptide and the site of injection. 
5 The protein which is used as an immunogen may be modified or 

administered in an adjuvant in order to increase the protein's antigenicity. Methods 
of increasing the antigenicity of a protein are well known in the art and include, but 
are not limited to coupling the antigen with a heterologous protein (such as globulin 
or galactosidase) or through the inclusion of an adjuvant during immunization. 

10 For monoclonal antibodies, spleen cells from the immunized animals are 

removed, fused with myeloma cells, such as SP2/0-Agl4 myeloma cells, and 
allowed to become monoclonal antibody producing hybridoma cells. 

Any one of a number of methods well known in the art can be used to 
identify the hybridoma cell which produces an antibody with the desired 

15 characteristics. These include screening the hybridomas with an ELISA assay, 
western blot analysis, or radioimmunoassay (Lutz et ai, Exp. Cell Res. 175:109- 
124 (1988)). 

Hybridomas secreting the desired antibodies are cloned and the class and 
subclass is determined using procedures known in the art (Campbell, A. M., 
20 Monoclonal Antibody Technology: Laboratory Techniques in Biochemistry and 
Molecular Biology, Elsevier Science Publishers, Amsterdam, The Netherlands 
(1984)). 

Techniques described for the production of single chain antibodies (U. S . 
Patent 4,946,778) can be adapted to produce single chain antibodies to proteins of 
25 the present invention. 

For polyclonal antibodies, antibody containing antisera is isolated from the 
immunized animal and is screened for the presence of antibodies with the desired 
specificity using one of the above-described procedures. 

The present invention further provides the above- described antibodies in 
30 detectably labelled form. Antibodies can be detectably labelled through the use of 
radioisotopes, affinity labels (such as biotin, avidin, etc.), enzymatic labels (such 
as horseradish peroxidase, alkaline phosphatase, etc.) fluorescent labels (such as 
FITC or rhodamine, etc.), paramagnetic atoms, etc. Procedures for accomplishing 
such labeling are well-known in the art, for example see Sternberger et ai, J. 
35 Histochem. Cytochem. 18:315 (1970); Bayer, E. A. et ai, Meth. Enzym. 62:308 
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(1979); Engval, E. et ai, Immunol. 109:129 (1972); Goding, J. W., /. Immunol. 
Meth. 75:215 (1976)). 

The labeled antibodies of the present invention can be used for in vitro, in 
vivo, and in situ assays to identify cells or tissues in which a fragment of the 
5 Streptococcus pneumoniae genome is expressed. 

The present invention further provides the above-described antibodies 
immobilized on a solid support. Examples of such solid supports include plastics 
such as polycarbonate, complex carbohydrates such as agarose and sepharose, 
acrylic resins and such as polyacrylamide and latex beads. Techniques for 

10 coupling antibodies to such solid supports are well known in the art (Weir, D. M. 
et ai, "Handbook of Experimental Immunology" 4th Ed., Blackwell Scientific 
Publications, Oxford, England, Chapter 10 (1986); Jacoby, W. D. et al, Meth. 
Enzym. 34 Academic Press, N. Y. (1974)). The immobilized antibodies of the 
present invention can be used for in vitro, in vivo, and in situ assays as well as for 

1 5 immunoaffinity purification of the proteins of the present invention. 

3. Diagnostic Assays and Kits 

The present invention further provides methods to identify the expression 
of one of the ORFs of the present invention, or homolog thereof, in a test sample, 

20 using one of the DFs or antibodies of the present invention. 

In detail, such methods comprise incubating a test sample with one or more 
of the antibodies or one or more of the DFs of the present invention and assaying 
for binding of the DFs or antibodies to components within the test sample. 

Conditions for incubating a DF or antibody with a test sample vary. 

25 Incubation conditions depend on the format employed in the assay, the detection 
methods employed, and the type and nature of the DF or antibody used in the 
assay. One skilled in the art will recognize that any one of the commonly available 
hybridization, amplification or immunological assay formats can readily be adapted 
to employ the DFs or antibodies of the present invention. Examples of such assays 

30 can be found in Chard, T., An Introduction to Radioimmunoassay and Related 
Techniques, Elsevier Science Publishers, Amsterdam, The Netherlands (1986); 
Bullock, G. R. et al., Techniques in Immunocytochemistry, Academic Press, 
Orlando, FL Vol. 1 (1982), Vol. 2 (1983), Vol. 3 (1985); Tijssen, P., Practice and 
Theory of Enzyme Immunoassays: Laboratory Techniques in Biochemistry and 
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Molecular Biology, Elsevier Science Publishers, Amsterdam, The Netherlands 
(1985). 

The test samples of the present invention include cells, protein or membrane 
extracts of cells, or biological fluids such as sputum, blood, serum, plasma, or 
5 urine. The test sample used in the above-described method will vary based on the 
assay format, nature of the detection method and the tissues, cells or extracts used 
as the sample to be assayed. Methods for preparing protein extracts or membrane 
extracts of cells are well known in the art and can be readily be adapted in order to 
obtain a sample which is compatible with the system utilized. 

10 In another embodiment of the present invention, kits are provided which 

contain the necessary reagents to carry out the assays of the present invention. 

Specifically, the invention provides a compartmentalized kit to receive, in 
close confinement, one or more containers which comprises: (a) a first container 
comprising one of the DFs or antibodies of the present invention; and (b) one or 

15 more other containers comprising one or more of the following: wash reagents, 
reagents capable of detecting presence of a bound DF or antibody. 

In detail, a compartmentalized kit includes any kit in which reagents are 
contained in separate containers. Such containers include small glass containers, 
plastic containers or strips of plastic or paper. Such containers allows one to 

20 efficiently transfer reagents from one compartment to another compartment such 
that the samples and reagents are not cross-contaminated, and the agents or 
solutions of each container can be added in a quantitative fashion from one 
compartment to another. Such containers will include a container which will accept 
the test sample, a container which contains the antibodies used in the assay, 

25 containers which contain wash reagents (such as phosphate buffered saline, Tris- 
buffers, etc.), and containers which contain the reagents used to detect the bound 
antibody or DF. 

Types of detection reagents include labelled nucleic acid probes, labelled 
secondary antibodies, or in the alternative, if the primary antibody is labelled, the 
30 enzymatic, or antibody binding reagents which are capable of reacting with the 
labelled antibody. One skilled in the art will readily recognize that the disclosed 
DFs and antibodies of the present invention can be readily incorporated into one of 
the established kit formats which are well known in the art. 

35 4. Screening Assay for Binding Agents 
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Using the isolated proteins of the present invention, the present invention 
further provides methods of obtaining and identifying agents which bind to a 
protein encoded by one of the ORFs of the present invention or to one of the 
fragments and the Streptococcus pneumoniae fragment and contigs herein 
5 described. 

In general, such methods comprise steps of: 

(a) contacting an agent with an isolated protein encoded by one of the 
ORFs of the present invention, or an isolated fragment of the Streptococcus 
pneumoniae genome; and 

10 (b) determining whether the agent binds to said protein or said fragment. 

The agents screened in the above assay can be, but are not limited to, 
peptides, carbohydrates, vitamin derivatives, or other pharmaceutical agents. The 
agents can be selected and screened at random or rationally selected or designed 
using protein modeling techniques. 

15 For random screening, agents such as peptides, carbohydrates, 

pharmaceutical agents and the like are selected at random and are assayed for their 
ability to bind to the protein encoded by the ORF of the present invention. 

Alternatively, agents may be rationally selected or designed. As used 
herein, an agent is said to be "rationally selected or designed" when the agent is 

20 chosen based on the configuration of the particular protein. For example, one 
skilled in the art can readily adapt currently available procedures to generate 
peptides, pharmaceutical agents and the like capable of binding to a specific peptide 
sequence in order to generate rationally designed antipeptide peptides, for example 
see Hurby et a/., "Application of Synthetic Peptides: Antisense Peptides," in 

25 Synthetic Peptides, A User's Guide, W. H. Freeman, NY (1992), pp. 289-307, 
and Kaspczak et al, Biochemistry 28:9230-8 (1989), or pharmaceutical agents, or 
the like. 

In addition to the foregoing, one class of agents of the present invention, as 
broadly described, can be used to control gene expression through binding to one 
30 of the ORFs or EMFs of the present invention. As described above, such agents 
can be randomly screened or rationally designed/selected. Targeting the ORF or 
EMF allows a skilled artisan to design sequence specific or element specific agents, 
modulating the expression of either a single ORF or multiple ORFs which rely on 
the same EMF for expression control. 
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One class of DNA binding agents are agents which contain base residues 
which hybridize or form a triple helix by binding to DNA or RNA. Such agents 
can be based on the classic phosphodiester, ribonucleic acid backbone, or can be a 
variety of sulfhydryl or polymeric derivatives which have base attachment capacity. 
5 Agents suitable for use in these methods usually contain 20 to 40 bases and 

are designed to be complementary to a region of the gene involved in transcription 
(triple helix - see Lee et ai, Nucl. Acids Res. 6:3073 (1979); Cooney et al, 
Science 241:456 (1988); and Dervan et al, Science 257:1360 (1991)) or to the 
mRNA itself (antisense - Okano, J. Neurochem. 56:560 (1991); 

10 Oligodeoxynucleotides as Antisense Inhibitors of Gene Expression, CRC Press, 
Boca Raton, FL (1988)). Triple helix- formation optimally results in a shut-off of 
RNA transcription from DNA, while antisense RNA hybridization blocks 
translation of an mRNA molecule into polypeptide. Both techniques have been 
demonstrated to be effective in model systems. Information contained in the 

15 sequences of the present invention can be used to design antisense and triple helix- 
forming oligonucleotides, and other DNA binding agents. 

5. Pharmaceutical Compositions and Vaccines 

The present invention further provides pharmaceutical agents which can be 

20 used to modulate the growth or pathogenicity of Streptococcus pneumoniae, or 
another related organism, in vivo or in vitro. As used herein, a "pharmaceutical 
agent" is defined as a composition of matter which can be formulated using known 
techniques to provide a pharmaceutical compositions. As used herein, the 
"pharmaceutical agents of the present invention" refers the pharmaceutical agents 

25 which are derived from the proteins encoded by the ORFs of the present invention 
or are agents which are identified using the herein described assays. 

As used herein, a pharmaceutical agent is said to "modulate the growth 
pathogenicity of Streptococcus pneumoniae or a related organism, in vivo or in 
vitro" when the agent reduces the rate of growth, rate of division, or viability of 

30 the organism in question. The pharmaceutical agents of the present invention can 
modulate the growth or pathogenicity of an organism in many fashions, although 
an understanding of the underlying mechanism of action is not needed to practice 
the use of the pharmaceutical agents of the present invention. Some agents will 
modulate the growth by binding to an important protein thus blocking the biological 

35 activity of the protein, while other agents may bind to a component of the outer 
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surface of the organism blocking attachment or rendering the organism more prone 
to act the bodies nature immune system. Alternatively, the agent may comprise a 
protein encoded by one of the ORFs of the present invention and serve as a 
vaccine. The development and use of a vaccine based on outer membrane 
5 components are well known in the art. 

As used herein, a "related organism" is a broad term which refers to any 
organism whose growth can be modulated by one of the pharmaceutical agents of 
the present invention. In general, such an organism will contain a homolog of the 
protein which is the target of the pharmaceutical agent or the protein used as a 
10 vaccine. As such, related organisms do not need to be bacterial but may be fungal 
or viral pathogens. 

The pharmaceutical agents and compositions of the present invention may 
be administered in a convenient manner, such as by the oral, topical, intravenous, 
intraperitoneal, intramuscular, subcutaneous, intranasal or intradermal routes. The 

15 pharmaceutical compositions are administered in an amount which is effective for 
treating and/or prophylaxis of the specific indication. In general, they are 
administered in an amount of at least about 1 mg/kg body weight and in most cases 
they will be administered in an amount not in excess of about 1 g/kg body weight 
per day. In most cases, the dosage is from about 0.1 mg/kg to about 10 g/kg body 

20 weight daily, taking into account the routes of administration, symptoms, etc. 

The agents of the present invention can be used in native form or can be 
modified to form a chemical derivative. As used herein, a molecule is said to be a 
"chemical derivative" of another molecule when it contains additional chemical 
moieties not normally a part of the molecule. Such moieties may improve the 

25 molecule's solubility, absorption, biological half life, etc. The moieties may 
alternatively decrease the toxicity of the molecule, eliminate or attenuate any 
undesirable side effect of the molecule, etc. Moieties capable of mediating such 
effects are disclosed in, among other sources, REMINGTON'S 
PHARMACEUTICAL SCIENCES (1980) cited elsewhere herein. 

30 F°r example, such moieties may change an immunological character of the 

functional derivative, such as affinity for a given antibody. Such changes in 
immunomodulation activity are measured by the appropriate assay, such as a 
competitive type immunoassay. Modifications of such protein properties as redox 
or thermal stability, biological half-life, hydrophobicity, susceptibility to proteolytic 

35 degradation or the tendency to aggregate with carriers or into multimers also may 
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be effected in this way and can be assayed by methods well known to the skilled 
artisan. 

The therapeutic effects of the agents of the present invention may be 
obtained by providing the agent to a patient by any suitable means (e.g., inhalation, 
5 intravenously, intramuscularly, subcutaneously, enterally, or parenterally). It is 
preferred to administer the agent of the present invention so as to achieve an 
effective concentration within the blood or tissue in which the growth of the 
organism is to be controlled. To achieve an effective blood concentration, the 
preferred method is to administer the agent by injection. The administration may be 
10 by continuous infusion, or by single or multiple injections. 

In providing a patient with one of the agents of the present invention, the 
dosage of the administered agent will vary depending upon such factors as the 
patient's age, weight, height, sex, general medical condition, previous medical 
history, etc. In general, it is desirable to provide the recipient with a dosage of 
15 agent which is in the range of from about 1 pg/kg to 10 mg/kg (body weight of 
patient), although a lower or higher dosage may be administered. The 
therapeutically effective dose can be lowered by using combinations of the agents 
of the present invention or another agent. 

As used herein, two or more compounds or agents are said to be 
20 administered "in combination" with each other when either (1) the physiological 
effects of each compound, or (2) the serum concentrations of each compound can 
be measured at the same time. The composition of the present invention can be 
administered concurrently with, prior to, or following the administration of the 
other agent. 

25 The agents of the present invention are intended to be provided to recipient 

subjects in an amount sufficient to decrease the rate of growth (as defined above) of 
the target organism. 

The administration of the agent(s) of the invention may be for either a 
"prophylactic" or "therapeutic" purpose. When provided prophylactically, the 

30 agent(s) are provided in advance of any symptoms indicative of the organisms 
growth. The prophylactic administration of the agent(s) serves to prevent, 
attenuate, or decrease the rate of onset of any subsequent infection. When 
provided therapeutically, the agent(s) are provided at (or shortly after) the onset of 
an indication of infection. The therapeutic administration of the compound(s) 
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serves to attenuate the pathological symptoms of the infection and to increase the 
rate of recovery. 

The agents of the present invention are administered to a subject, such as a 
mammal, or a patient, in a pharmaceutically acceptable form and in a therapeutically 
5 effective concentration. A composition is said to be "pharmacologically acceptable" 
if its administration can be tolerated by a recipient patient. Such an agent is said to 
be administered in a "therapeutically effective amount" if the amount administered 
is physiologically significant. An agent is physiologically significant if its presence 
results in a detectable change in the physiology of a recipient patient. 
10 The agents of the present invention can be formulated according to known 

methods to prepare pharmaceutically useful compositions, whereby these materials, 
or their functional derivatives, are combined in a mixture with a pharmaceutically 
acceptable carrier vehicle. Suitable vehicles and their formulation, inclusive of 
other human proteins, e.g., human serum albumin, are described, for example, in 

15 REMINGTON'S PHARMACEUTICAL SCIENCES, 16th Ed>) 0sol A > Ed 
Mack Publishing, Easton PA (1980). In order to form a pharmaceutically 
acceptable composition suitable for effective administration, such compositions will 
contain an effective amount of one or more of the agents of the present invention, 
together with a suitable amount of carrier vehicle. 

20 Additional pharmaceutical methods may be employed to control the duration 

of action. Control release preparations may be achieved through the use of 
polymers to complex or absorb one or more of the agents of the present invention. 
The controlled delivery may be effectuated by a variety of well known techniques, 
including formulation with macromolecules such as, for example, polyesters, 

25 polyamino acids, polyvinyl, pyrrolidone, ethylenevinylacetate, methylcellulose, 
carboxymethylcellulose, or protamine, sulfate, adjusting the concentration of the 
macromolecules and the agent in the formulation, and by appropriate use of 
methods of incorporation, which can be manipulated to effectuate a desired time 
course of release. Another possible method to control the duration of action by 

30 controlled release preparations is to incorporate agents of the present invention into 
particles of a polymeric material such as polyesters, polyamino acids, hydrogels, 
poly(lactic acid) or ethylene vinylacetate copolymers. Alternatively, instead of 
incorporating these agents into polymeric particles, it is possible to entrap these 
materials in microcapsules prepared, for example, by coacervation techniques or by 

35 interfacial polymerization with, for example, hydroxymethylcellulose or gelatine- 
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microcapsules and poly(methylmethacylate) microcapsules, respectively, or in 
colloidal drug delivery systems, for example, liposomes, albumin microspheres, 
microemulsions, nanoparticles, and nanocapsules or in macroemulsions. Such 
techniques are disclosed in REMINGTONS PHARMACEUTICAL SCIENCES 
5 (1980). 

The invention further provides a pharmaceutical pack or kit comprising one 
or more containers filled with one or more of the ingredients of the pharmaceutical 
compositions of the invention. Associated with such container(s) can be a notice in 
the form prescribed by a governmental agency regulating the manufacture, use or 
10 sale of pharmaceuticals or biological products, which notice reflects approval by 
the agency of manufacture, use or sale for human administration. 

In addition, the agents of the present invention may be employed in 
conjunction with other therapeutic compounds. 

15 6. Shot-Gun Approach to Megabase DNA Sequencing 

The present invention further demonstrates that a large sequence can be 
sequenced using a random shotgun approach. This procedure, described in detail 
in the examples that follow, has eliminated the up front cost of isolating and 
ordering overlapping or contiguous subclones prior to the start of the sequencing 
20 protocols. 

Certain aspects of the present invention are described in greater detail in the 
examples that follow. The examples are provided by way of illustration. Other 
aspects and embodiments of the present invention are contemplated by the 
inventors, as will be clear to those of skill in the art from reading the present 
25 disclosure. 

ILLUSTRATIVE EXAMPLES 

LIBRARIES AND SEQUENCING 
30 1. Shotgun Sequencing Probability Analysis 

The overall strategy for a shotgun approach to whole genome sequencing 

follows from the Lander and Waterman (Landerman and Waterman, Genomics 

2:231 (1988)) application of the equation for the Poisson distribution. According 

to this treatment, the probability, P , that any given base in a sequence of size L, in 

35 nucleotides, is not sequenced after a certain amount, n, in nucleotides of random 
0 
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sequence has been determined can be calculated by the equation P = e" m , where m 
is L/n, the fold coverage. For instance, for a genome of 2.8 Mb, m=l when 2.8 
Mb of sequence has been randomly generated (IX coverage). APthat point, P = 
e' 1 = 0.37. The probability that any given base has not been sequenced is the same 
5 as the probability that any region of the whole sequence L has not been determined 
and, therefore, is equivalent to the fraction of the whole sequence that has yet to be 
determined. Thus, at one-fold coverage, approximately 37% of a polynucleotide of 
size L, in nucleotides has not been sequenced. When 14 Mb of sequence has been 
generated, coverage is 5X for a 2.8 Mb and the unsequenced fraction drops to 

10 .0067 or 0.67%. 5X coverage of a 2.8 Mb sequence can be attained by sequencing 
approximately 17,000 random clones from both insert ends with an average 
sequence read length of 410 bp. 

Similarly, the total gap length, G, is determined by the equation G = Le" m , 
and the average gap size, g, follows the equation, g = L/n. Thus, 5X coverage 

15 leaves about 240 gaps averaging about 82 bp in size in a sequence of a 
polynucleotide 2.8 Mb long. 

The treatment above is essentially that of Lander and Waterman, Genomics 
2: 231 (1988). 

20 2. Random Library Construction 

In order to approximate the random model described above during actual 
sequencing, a nearly ideal library of cloned genomic fragments is required. The 
following library construction procedure was developed to achieve this end. 

Streptococcus pneumoniae DNA is prepared by phenol extraction. A 

25 mixture containing 200 ug DNA in 1 .0 ml of 300 mM sodium acetate, 10 mM Tris- 
HC1, 1 mM Na-EDTA, 50% glycerol is processed through a nebulizer (IPI Medical 
Products) with a stream of nitrogen adjusted to 35 Kpa for 2 minutes. The 
sonicated DNA is ethanol precipitated and redissolved in 500 u.1 TE buffer. 

To create blunt-ends, a 100 |il aliquot of the resuspended DNA is digested 

30 with 5 units of BAL31 nuclease (New England BioLabs) for 10 min at 30°C in 200 
u.1 BAL31 buffer. The digested DNA is phenol-extracted, ethanol-precipitated, 
redissolved in 100 ul TE buffer, and then size-fractionated by electrophoresis 
through a 1.0% low melting temperature agarose gel. The section containing DNA 
fragments 1.6-2.0 kb in size is excised from the gel, and the LGT agarose is melted 

35 and the resulting solution is extracted with phenol to separate the agarose from the 
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DNA. DNA is ethanol precipitated and redissolved in 20 ul of TE buffer for 
ligation to vector. 

A two-step ligation procedure is used to produce a plasmid library with 
97% inserts, of which >99% were single inserts. The first ligation mixture (50 ul) 
5 contains 2 ug of DNA fragments, 2 ug pUC18 DNA (Pharmacia) cut with Smal 
and dephosphorylated with bacterial alkaline phosphatase, and 10 units of T4 ligase 
(GIBCO/BRL) and is incubated at 14°C for 4 hr. The ligation mixture then is 
phenol extracted and ethanol precipitated, and the precipitated DNA is dissolved in 
20 ul TE buffer and electrophoresed on a 1.0% low melting agarose gel. Discrete 

10 bands in a ladder are visualized by ethidium bromide-staining and UV illumination 
and identified by size as insert (I), vector (v), v+I, v+2i, v+3i, etc. The portion of 
the gel containing v+I DNA is excised and the v+I DNA is recovered and 
resuspended into 20 ul TE. The v+I DNA then is blunt-ended by T4 polymerase 
treatment for 5 min. at 37°C in a reaction mixture (50 ul) containing the v+I linears, 

15 500 uM each of the 4 dNTPs, and 9 units of T4 polymerase (New England 
BioLabs), under recommended buffer conditions. After phenol extraction and 
ethanol precipitation the repaired v+I linears are dissolved in 20 ju.1 TE. The final 
ligation to produce circles is carried out in a 50 |xl reaction containing 5 ul of v+I 
linears and 5 units of T4 ligase at 14°C overnight. After 10 min. at 70°C the 

20 following day, the reaction mixture is stored at -20°C. 

This two-stage procedure results in a molecularly random collection of 
single-insert plasmid recombinants with minimal contamination from double-insert 
chimeras (<1%) or free vector (<3%). 

Since deviation from randomness can arise from propagation the DNA in 

25 the host, E. coli host cells deficient in all recombination and restriction functions 
(A. Greener, Strategies 3 (1):5 (1990)) are used to prevent rearrangements, 
deletions, and loss of clones by restriction. Furthermore, transformed cells are 
plated directly on antibiotic diffusion plates to avoid the usual broth recovery phase 
which allows multiplication and selection of the most rapidly growing cells. 

30 Plating is carried out as follows. A 100 ul aliquot of Epicurian Coli SURE 

II Supercompetent Cells (Stratagene 200152) is thawed on ice and transferred to a 
chilled Falcon 2059 tube on ice. A 1.7 ul aliquot of 1.42 M beta-mercaptoethanol 
is added to the aliquot of cells to a final concentration of 25 mM. Cells are 
incubated on ice for 10 min. A 1 ul aliquot of the final ligation is added to the cells 

35 and incubated on ice for 30 min. The cells are heat pulsed for 30 sec. at 42°C and 
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placed back on ice for 2 min. The outgrowth period in liquid culture is eliminated 
from this protocol in order to minimize the preferential growth of any given 
transformed cell. Instead the transformation mixture is plated directly on a nutrient 
rich SOB plate containing a 5 ml bottom layer of SOB agar (5% SOB agar: 20 g 
5 tryptone, 5 g yeast extract, 0.5 g NaCl, 1 .5% Difco Agar per liter of media). The 5 
ml bottom layer is supplemented with 0.4 ml of 50 mg/ml ampicillin per 100 ml 
SOB agar. The 15 ml top layer of SOB agar is supplemented with 1 ml X-Gal 
(2%), 1 ml MgCl (1 M), and 1 ml MgSO /100 ml SOB agar. The 15 ml top layer 
is poured just prior to plating. Our titer is approximately 100 colonies/10 ul aliquot 
10 of transformation? 4 

All colonies are picked for template preparation regardless of size. Thus, 
only clones lost due to "poison" DNA or deleterious gene products are deleted from 
the library, resulting in a slight increase in gap number over that expected. 

15 3. Random DNA Sequencing 

High quality double stranded DNA plasmid templates are prepared using a 
"boiling bead" method developed in collaboration with Advanced Genetic 
Technology Corp. (Gaithersburg, MD) (Adams et al, Science 252:1651 (1991); 
Adams et ai, Nature 555:632 (1992)). Plasmid preparation is performed in a 96- 

20 well format for all stages of DNA preparation from bacterial growth through final 
DNA purification. Template concentration is determined using Hoechst Dye and a 
Millipore Cytofluor. DNA concentrations are not adjusted, but low-yielding 
templates are identified where possible and not sequenced. 

Templates are also prepared from two Streptococcus pneumoniae lambda 

25 genomic libraries. An amplified library is constructed in the vector Lambda GEM- 
12 (Promega) and an unamplified library is constructed in Lambda DASH II 
(Stratagene). In particular, for the unamplified lambda library, Streptococcus 
pneumoniae DNA (> 100 kb) is partially digested in a reaction mixture (200 ul) 
containing 50 ug DNA, IX Sau3AI buffer, 20 units Sau3AI for 6 min. at 23°C. 

30 The digested DNA was phenol-extracted and electrophoresed on a 0.5% low 
melting agarose gel at 2V/cm for 7 hours. Fragments from 15 to 25 kb are excised 
and recovered in a final volume of 6 ul. One ^1 of fragments is used with 1 (il of 
DASHII vector (Stratagene) in the recommended ligation reaction. One ui of the 
ligation mixture is used per packaging reaction following the recommended 

35 protocol with the Gigapack II XL Packaging Extract (Stratagene, #2277 11). Phage 
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are plated directly without amplification from the packaging mixture (after dilution 
with 500 ill of recommended SM buffer and chloroform treatment). Yield is about 
2.5xl0 3 pfu/ul. The amplified library is prepared essentially as above except the 
lambda GEM-12 vector is used. After packaging, about 3.5xl0 4 pfu are plated on 
5 the restrictive NM539 host. The lysate is harvested in 2 ml of SM buffer and 
stored frozen in 7% dimethylsulfoxide. The phage titer is approximately lxlO 9 
pfu/ml. 

Liquid lysates (100 pi) are prepared from randomly selected plaques (from 
the unamplified library) and template is prepared by long-range PCR using T7 and 

10 T3 vector-specific primers. 

Sequencing reactions are carried out on plasmid and/or PCR templates 
using the AB Catalyst LabStation with Applied Biosystems PRISM Ready 
Reaction Dye Primer Cycle Sequencing Kits for the M13 forward (M13-21) and 
the M13 reverse (M13RP1) primers (Adams et al, Nature 368:474 (1994)). Dye 

15 terminator sequencing reactions are carried out on the lambda templates on a 
Perkin-Elmer 9600 Thermocycler using the Applied Biosystems Ready Reaction 
Dye Terminator Cycle Sequencing kits. T7 and SP6 primers are used to sequence 
the ends of the inserts from the Lambda GEM-12 library and T7 and T3 primers are 
used to sequence the ends of the inserts from the Lambda DASH II library. 

20 Sequencing reactions are performed by eight individuals using an average of 
fourteen AB 373 DNA Sequencers per day. All sequencing reactions are analyzed 
using the Stretch modification of the AB 373, primarily using a 34 cm well-to-read 
distance. The overall sequencing success rate very approximately is about 85% for 
M13-21 and M13RP1 sequences and 65% for dye-terminator reactions. The 

25 average usable read length is 485 bp for M13-21 sequences, 445bp for M13RP1 
sequences, and 375 bp for dye-terminator reactions. 

Richards et al., Chapter 28 in AUTOMATED DNA SEQUENCING AND 
ANALYSIS, M. D. Adams, C. Fields, J. C. Venter, Eds., Academic Press, 
London, (1994) described the value of using sequence from both ends of 

30 sequencing templates to facilitate ordering of contigs in shotgun assembly projects 
of lambda and cosmid clones. We balance the desirability of both-end sequencing 
(including the reduced cost of lower total number of templates) against shorter 
read-lengths for sequencing reactions performed with the M13RP1 (reverse) primer 
compared to the Ml 3-21 (forward) primer. Approximately one-half of the 

35 templates are sequenced from both ends. Random reverse sequencing reactions are 
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done based on successful forward sequencing reactions. Some M13RP1 
sequences are obtained in a semi-directed fashion: Ml 3-21: sequences pointing 
outward at the ends of contigs are chosen for M13RP1 sequencing in an effort to 
specifically order contigs. 

5 

4. Protocol for Automated Cycle Sequencing 

The sequencing is carried out using ABI Catalyst robots and AB 373 
Automated DNA Sequencers. The Catalyst robot is a publicly available 
sophisticated pipetting and temperature control robot which has been developed 

10 specifically for DNA sequencing reactions. The Catalyst combines pre-aliquoted 
templates and reaction mixes consisting of deoxy- and dideoxynucleotides, the 
thermostable Taq DNA polymerase, fluorescently-labelled sequencing primers, and 
reaction buffer. Reaction mixes and templates are combined in the wells of an 
aluminum 96-well thermocycling plate. Thirty consecutive cycles of linear 

15 amplification (i.e.., one primer synthesis) steps are performed including 
denaturation, annealing of primer and template, and extension; i.e., DNA 
synthesis. A heated lid with rubber gaskets on the thermocycling plate prevents 
evaporation without the need for an oil overlay. 

Two sequencing protocols are used: one for dye-labelled primers and a 

20 second for dye-labelled dideoxy chain terminators. The shotgun sequencing 
involves use of four dye-labelled sequencing primers, one for each of the four 
terminator nucleotide. Each dye-primer is labelled with a different fluorescent dye, 
permitting the four individual reactions to be combined into one lane of the 373 
DNA Sequencer for electrophoresis, detection, and base-calling. ABI currently 

25 supplies pre-mixed reaction mixes in bulk packages containing all the necessary 
non-template reagents for sequencing. Sequencing can be done with both plasmid 
and PCR- generated templates with both dye-primers and dye- terminators with 
approximately equal fidelity, although plasmid templates generally give longer 
usable sequences. 

30 Thirty-two reactions are loaded per AB373 Sequencer each day, for a total 

of 960 samples. Electrophoresis is run overnight following the manufacturer's 
protocols, and the data is collected for twelve hours. Following electrophoresis 
and fluorescence detection, the ABI 373 performs automatic lane tracking and base- 
calling. The lane-tracking is confirmed visually. Each sequence electropherogram 

35 (or fluorescence lane trace) is inspected visually and assessed for quality. Trailing 
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sequences of low quality are removed and the sequence itself is loaded via software 
to a Sybase database (archived daily to 8mm tape). Leading vector polylinker 
sequence is removed automatically by a software program. Average edited lengths 
of sequences from the standard ABI 373 are around 400 bp and depend mostly on 
5 the quality of the template used for the sequencing reaction. ABI 373 Sequencers 
converted to Stretch Liners provide a longer electrophoresis path prior to 
fluorescence detection and increase the average number of usable bases to 500-600 
bp. 

10 INFORMATICS 

1. Data Management 

A number of information management systems for a large-scale sequencing 
lab have been developed. (For review see, for instance, Kerlavage et ai, 
Proceedings of the Twenty-Sixth Annual Hawaii International Conference on 

15 System Sciences, IEEE Computer Society Press, Washington D. C, 585 (1993)) 
The system used to collect and assemble the sequence data was developed using the 
Sybase relational database management system and was designed to automate data 
flow wherever possible and to reduce user error. The database stores and 
correlates all information collected during the entire operation from template 

20 preparation to final analysis of the genome. Because the raw output of the ABI 373 
Sequencers was based on a Macintosh platform and the data management system 
chosen was based on a Unix platform, it was necessary to design and implement a 
variety of multi- user, client-server applications which allow the raw data as well as 
analysis results to flow seamlessly into the database with a minimum of user effort. 

25 

2. Assembly 

An assembly engine (TIGR Assembler) developed for the rapid and 
accurate assembly of thousands of sequence fragments was employed to generate 
contigs. The TIGR assembler simultaneously clusters and assembles fragments of 

30 the genome. In order to obtain the speed necessary to assemble more than 10 4 
fragments, the algorithm builds a hash table of 12 bp oligonucleotide subsequences 
to generate a list of potential sequence fragment overlaps. The number of potential 
overlaps for each fragment determines which fragments are likely to fall into 
repetitive elements. Beginning with a single seed sequence fragment, TIGR 

35 Assembler extends the current contig by attempting to add the best matching 
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fragment based on oligonucleotide content. The contig and candidate fragment are 
aligned using a modified version of the Smith- Waterman algorithm which provides 
for optimal gapped alignments (Waterman, M. S., Methods in Enzymology 
164:165 (1988)). The contig is extended by the fragment only if strict criteria for 
5 the quality of the match are met. The match criteria include the minimum length of 
overlap, the maximum length of an unmatched end, and the minimum percentage 
match. These criteria are automatically lowered by the algorithm in regions of 
minimal coverage and raised in regions with a possible repetitive element. The 
number of potential overlaps for each fragment determines which fragments are 

10 likely to fall into repetitive elements. Fragments representing the boundaries of 
repetitive elements and potentially chimeric fragments are often rejected based on 
partial mismatches at the ends of alignments and excluded from the current contig. 
TIGR Assembler is designed to take advantage of clone size information coupled 
with sequencing from both ends of each template. It enforces the constraint that 

15 sequence fragments from two ends of the same template point toward one another 
in the contig and are located within a certain range of base pairs (definable for each 
clone based on the known clone size range for a given library). 

The process resulted in 391 contigs as represented by SEQ ID NOs: 1-391. 

20 3. Identifying Genes 

The predicted coding regions of the Streptococcus pneumoniae genome 
were initially defined with the program GeneMark, which finds ORFs using a 
probabilistic classification technique. The predicted coding region sequences were 
used in searches against a database of all nucleotide sequences from GenBank 

25 (October, 1997), using the BLASTN search method to identify overlaps of 50 or 
more nucleotides with at least a 95% identity. Those ORFs with nucleotide 
sequence matches are shown in Table 1. The ORFs without such matches were 
translated to protein sequences and compared to a non-redundant database of 
known proteins generated by combining the Swiss-prot, PIR and GenPept 

30 databases. ORFs that matched a database protein with BLASTP probability less 
than or equal to 0.01 are shown in Table 2. The table also lists assigned functions 
based on the closest match in the databases. ORFs that did not match protein or 
nucleotide sequences in the databases at these levels are shown in Table 3. 
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ILLUSTRATIVE APPLICATIONS 

1. Production of an Antibody to a Streptococcus pneumoniae 
Protein 

Substantially pure protein or polypeptide is isolated from the transfected or 
5 transformed cells using any one of the methods known in the art. The protein can 
also be produced in a recombinant prokaryotic expression system, such as E. coli, 
or can be chemically synthesized. Concentration of protein in the final preparation 
is adjusted, for example, by concentration on an Amicon filter device, to the level 
of a few micrograms/ml. Monoclonal or polyclonal antibody to the protein can 
10 then be prepared as follows. 

2. Monoclonal Antibody Production by Hybridoma Fusion 

Monoclonal antibody to epitopes of any of the peptides identified and 
isolated as described can be prepared from murine hybridomas according to the 

15 classical method of Kohler, G. and Milstein, C, Nature 256:495 (1975) or 
modifications of the methods thereof. Briefly, a mouse is repetitively inoculated 
with a few micrograms of the selected protein over a period of a few weeks. The 
mouse is then sacrificed, and the antibody producing cells of the spleen isolated. 
The spleen cells are fused by means of polyethylene glycol with mouse myeloma 

20 cells, and the excess unfused cells destroyed by growth of the system on selective 
media comprising aminopterin (HAT media). The successfully fused cells are 
diluted and aliquots of the dilution placed in wells of a microtiter plate where 
growth of the culture is continued. Antibody-producing clones are identified by 
detection of antibody in the supernatant fluid of the wells by immunoassay 

25 procedures, such as ELISA, as originally described by Engvall, E., Meth. 
Enzymol. 70:419 (1980), and modified methods thereof. Selected positive clones 
can be expanded and their monoclonal antibody product harvested for use. Detailed 
procedures for monoclonal antibody production are described in Davis, L. et ai, 
Basic Methods in Molecular Biology, Elsevier, New York. Section 21-2 (1989). 

30 
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3. Polyclonal Antibody Production by Immunization 

Polyclonal antiserum containing antibodies to heterogenous epitopes of a 
single protein can be prepared by immunizing suitable animals with the expressed 
protein described above, which can be unmodified or modified to enhance 
5 immunogenicity. Effective polyclonal antibody production is affected by many 
factors related both to the antigen and the host species. For example, small 
molecules tend to be less immunogenic than others and may require the use of 
carriers and adjuvant. Also, host animals vary in response to site of inoculations 
and dose, with both inadequate or excessive doses of antigen resulting in low titer 
10 antisera. Small doses (ng level) of antigen administered at multiple intradermal 
sites appears to be most reliable. An effective immunization protocol for rabbits 
can be found in Vaitukaitis, J. el «/., J. Clin. Endocrinol. Metab. 33:988-991 
(1971). 

Booster injections can be given at regular intervals, and antiserum harvested 

15 when antibody titer thereof, as determined semi-quantitatively, for example, by 
double immunodiffusion in agar against known concentrations of the antigen, 
begins to fall. See, for example, Ouchterlony, O. etai, Chap. 19 in: Handbook of 
Experimental Immunology, Wier, D., ed, Blackwell (1973). Plateau concentration 
of antibody is usually in the range of 0.1 to 0.2 mg/ml of serum (about 12M). 

20 Affinity of the antisera for the antigen is determined by preparing competitive 
binding curves, as described, for example, by Fisher, D., Chap. 42 in: Manual of 
Clinical Immunology, second edition, Rose and Friedman, eds.. Amer. Soc. For 
Microbiology, Washington, D. C. (1980) 

Antibody preparations prepared according to either protocol are useful in 

25 quantitative immunoassays which determine concentrations of antigen -bearing 
substances in biological samples; they are also used semi- quantitatively or 
qualitatively to identify the presence of antigen in a biological sample. In addition, 
antibodies are useful in various animal models of pneumococcal disease as a means 
of evaluating the protein used to make the antibody as a potential vaccine target or 

30 as a means of evaluating the antibody as a potential immunotherapeutic or 
immunoprophylactic reagent. 
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4. Preparation of PCR Primers and Amplification of DNA 

Various fragments of the Streptococcus pneumoniae genome, such as those 
of Tables 1-3 and SEQ ID NOS:l-391 can be used, in accordance with the present 
invention, to prepare PCR primers for a variety of uses. The PCR primers are 
5 preferably at least 15 bases, and more preferably at least 1 8 bases in length. When 
selecting a primer sequence, it is preferred that the primer pairs have approximately 
the same G/C ratio, so that melting temperatures are approximately the same. The 
PCR primers and amplified DNA of this Example find use in the Examples that 
follow. 

10 

5. Gene expression from DNA Sequences Corresponding to 

ORFs 

A fragment of the Streptococcus pneumoniae genome provided in Tables 1 - 
3 is introduced into an expression vector using conventional technology. 

15 Techniques to transfer cloned sequences into expression vectors that direct protein 
translation in mammalian, yeast, insect or bacterial expression systems are well 
known in the art. Commercially available vectors and expression systems are 
available from a variety of suppliers including Stratagene (La Jolla, California), 
Promega (Madison, Wisconsin), and Invitrogen (San Diego, California). If 

20 desired, to enhance expression and facilitate proper protein folding, the codon 
context and codon pairing of the sequence may be optimized for the particular 
expression organism, as explained by Hatfield et aL, U. S. Patent No. 5,082,767, 
incorporated herein by this reference. 
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The following is provided as one exemplary method to generate 
polypeptide(s) from cloned ORFs of the Streptococcus pneumoniae genome 
fragment. Bacterial ORFs generally lack a poly A addition signal. The addition 
signal sequence can be added to the construct by, for example, splicing out the poly 
5 A addition sequence from pSG5 (Stratagene) using Bgll and Sail restriction 
endonuclease enzymes and incorporating it into the mammalian expression vector 
pXTl (Stratagene) for use in eukaryotic expression systems. pXTl contains the 
LTRs and a portion of the gag gene of Moloney Murine Leukemia Virus. The 
positions of the LTRs in the construct allow efficient stable transfection. The 

10 vector includes the Herpes Simplex thymidine kinase promoter and the selectable 
neomycin gene. The Streptococcus pneumoniae DNA is obtained by PCR from the 
bacterial vector using oligonucleotide primers complementary to the Streptococcus 
pneumoniae DNA and containing restriction endonuclease sequences for PstI 
incorporated into the 5' primer and Bglll at the 5' end of the corresponding 

15 Streptococcus pneumoniae DNA 3' primer, taking care to ensure that the 
Streptococcus pneumoniae DNA is positioned such that its followed with the poly 
A addition sequence. The purified fragment obtained from the resulting PCR 
reaction is digested with PstI, blunt ended with an exonuclease, digested with 
Bglll, purified and ligated to pXTl, now containing a poly A addition sequence 

20 and digested Bglll. 

The ligated product is transfected into mouse NIH 3T3 cells using 
Lipofectin (Life Technologies, Inc., Grand Island, New York) under conditions 
outlined in the product specification. Positive transfectants are selected after 
growing the transfected cells in 600 ug/ml G418 (Sigma, St. Louis, Missouri). 

25 The protein is preferably released into the supernatant. However if the protein has 
membrane binding domains, the protein may additionally be retained within the cell 
or expression may be restricted to the cell surface. Since it may be necessary to 
purify and locate the transfected product, synthetic 15-mer peptides synthesized 
from the predicted Streptococcus pneumoniae DNA sequence are injected into mice 

30 to generate antibody to the polypeptide encoded by the Streptococcus pneumoniae 
DNA. 
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Alternatively and if antibody production is not possible, the Streptococcus 
pneumoniae DNA sequence is additionally incorporated into eukaryotic expression 
vectors and expressed as, for example, a globin fusion. Antibody to the globin 
moiety then is used to purify the chimeric protein. Corresponding protease 
5 cleavage sites are engineered between the globin moiety and the polypeptide 
encoded by the Streptococcus pneumoniae DNA so that the latter may be freed 
from the formed by simple protease digestion. One useful expression vector for 
generating globin chimerics is pSG5 (Stratagene). This vector encodes a rabbit 
globin. Intron II of the rabbit globin gene facilitates splicing of the expressed 

10 transcript, and the polyadenylation signal incorporated into the construct increases 
the level of expression. These techniques are well known to those skilled in the art 
of molecular biology. Standard methods are published in methods texts such as 
Davis et al., cited elsewhere herein, and many of the methods are available from the 
technical assistance representatives from Stratagene, Life Technologies, Inc., or 

15 Promega. Polypeptides of the invention also may be produced using in vitro 
translation systems such as in vitro ExpressTM Translation Kit (Stratagene). 

While the present invention has been described in some detail for purposes 
of clarity and understanding, one skilled in the art will appreciate that various 
changes in form and detail can be made without departing from the true scope of 

20 the invention. 

All patents, patent applications and publications referred to above are 
hereby incorporated by reference. 
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(ii) TITLE OF INVENTION: Streptococcus pneumoniae Polynucleotides and Sequences 

(iii) NUMBER OF SEQUENCES: 391 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Human Genome Sciences, Inc. 

(B) STREET: 9410 Key West Avenue 

(C) CITY: Rockville 

(D) STATE: Maryland 

(E) COUNTRY: USA 

(F) ZIP: 20850 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Diskette, 3.50 inch, 1.4Mb storage 

(B) COMPUTER : HP Vectra 486/33 

(C) OPERATING SYSTEM: MSDOS version 6.2 

(D) SOFTWARE: ASCII Text 



(vi) CURRENT APPLICATION DATA: 
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(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(viii) ATTORNEY /AGENT INFORMATION: 

(A) NAME: Brookes, A. Anders 

(B) REGISTRATION NUMBER: 36,373 

(C) REFERENCE /DOCKET NUMBER: PB340P1 

(vi) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (301) 309-8504 

(B) TELEFAX : (301) 309-8512 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 562 5 base pair: 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: 

CCAAGCAAAA CCAGCTACAG CTAAAGGAAC TTACGTAACA AACTTGACTA TCACAACTAC 60 

TCAAGGTGTT GGTATCAAAG TTGACGTAAA CTCACTTTAA TCAGTAGTTA AAGTAATGTA 120 

AAAAAGTTGA AGACGCTATG TCTCAACTTT TTTTGATGTA CGACGGGCAT GTTGTATAGT 180 

AGATGTGTAC TATTCTAGTT TCAATCTACT ATAGTAGCTC AGAAGTCGGT ACTTAAACGT 240 

GCTATATCAA AACCAGTCCT TGAAAAACGT GGACTGGTTT CGTGTTTGGA TTATTACCTT 300 

GAACGACATG CGTTAAAAGT TAGTTGAACC GCCGTATGCC GAACGGACGT ACGGTGGTGT 360 

GAGAGGGGCT AGAGATTATC CCCTACTCGA TTTCGAAATC TAGTGGAATG AATCTGGAAT 420 

AGTCCATCGA GCTTTCTAAT ACTCTTCGAA AATCTCTTCA AACCACGTCA ACGTCGCCTT 480 

GCCGTGCGTA TGGTTACTGA CTTCGTCAGT TCTATCCACA ACCTCAAAAC AGTGTTTTGA 540 

GCTGACTACG TCAGTTCCAT CTACAACCTC AAAACAGTGT TTTGAGCAAC CTGCGGCTAG 500 

TTTCCTAGTT TGCTCTTTGG TTTTCATTGA GTATAACACA TTGTTAGAAG TTGGTTTAAA 660 

TTTCCTAATC AGTTTGTTCA CATTTACCTT CGATATATTA TATCCCATAG TTAAGGTTGG 72 0 

TCATACAGAT GATTATAGTC ATGGAGCCGT AAAACTTAGT GTTTCTTTAG TTGACAAAGA 780 

TGCCATGAAA AAAATATTTG TAACTGTAAT AGGATATTTT GAAATAAATA TAGATGAAAA 840 

TATCACCGAT ATTCTATACG TAAATGGTAC TGCTATTCTT TATCTTTATT TACGTTCAAT 9 00 

TGTTTCAATA GTTTCGGCAA TTGATAGCAG TGAAGCAATG TTGCTACCTA TCATTAATGT 960 

TTTAGAGTTA CTAGATAAAT CTCAACCTTT TGAAGAAGAA TAATTTATTA GCTCACTAAA 1020 

TTGAGGGTAA GGAAAAGTAA AAGCAGTAAG AAAAATGTCT TGCATTATAC AGCAACCTTT 1080 

TGGGAATGAG TGGATGGATT GAATAAAATT TGATTAAGAG TGGATGATTT ATCTGTAGAT 1140 

TATTATTGGA CAGTTAGTCT TGAAGTAGTC TAAGAATTAG GTTATAATCA GTAGAAGCCT 12 00 

TGCTAATAAT GAGGAGGTTA GTTTATGTAT AGTAGACTGA ATCTAAAATA GTACGAAACA 12 60 

ATTGCTAAAA CATTTATAGA AATTAATTTT ACTTTCCCAA TCGATTTGTT CTCATCTTAT 13 2 0 

TTCAATCCGC TAT AT AT TAT GGTATCGAAT CTTCATCAGA ATGATAAAAT TAATCAATTG 13 80 

ATATCTGATT ACAAACAGAA TATGAAAGCT TTTTATATCA CTATTGAAAA ATTTATACGA 144 0 
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GATGATGAAA GCCTTAAGTG TTATTTTATA AAGGTTATTT CAAGTCGTTC CAAGGTAACA 1500 

AGTCTAGATC AGATTGAAGC TGATAAAACG ATACAAAGAA AATATTCAAG TGAGCTAAAA 1560 

AAATTTATTG GATTTTATAA TGAGATTATT TGTGAGGAAA ATAGTTTCCT ACATGTACGA 1620 

AAGAGGTGGT CGAGTTGGTT TAGGTAGTCG ATGCGTGAGT TGATAATTCT CAGGGTATGG 1680 

ACTTCTTTTT CATGAATGAG GTAAAAGAGC AGGTATTGTT TAGAGACAAT CATTCTGAGC 1740 

ATATTTTCTG GATAGAGGGA GTATCCGATT TTATGATCAA AGTTAATACC GCCCTCTGGT 1800 

G AG AAG AT G A GTAGGTTGGT AATTTAAACT ATTAAACAGA ATTTTTGATT AAAAGTATTA 1860 

TTTCATGAGA GAAATCCTAA TTTCACAATC CATAGGCAAA CGCTTGCATT TCGTTTTTTA 192 0 

TTGGACTATA ATAGGTTGGT ATAAAGCCTT CTGTAGTAAT AAAATGTAGA AGGTGTAGAA 1980 

AGTAAGGATT TAGAATATTT GTAGTTAAAA ACACAATGTT GCTATTCCTT ACGATAGGGA 2 040 

GATAGATATG GCAATGATAG AAGTGGAACA TCTTCAGAAA AATTTTGTGA AGACTGTTAA 2100 

GGAACCGGGC TTGAAGGGGG CTTTGCGCTC CTTTATTCAT CCTGAAAAGC AGACCTTTGA 2160 

AGCGGTCAAG GATTTGACCT TTGAGGTTCC AAAAGGGCAG ATTTTAGGAT TTATCGGGGC 2220 

AAATGGTGCT GGGAAGTCGA CAACCATTAA AATGCTGACA GGAATTTTGA AACCAACATC 2280 

TGGTTTTTGT CGGATTAACG GCAAGATTCC CCAGGACAAT CGGCAAGATT ATGTCAAAGA 23 40 

TATTGGCGTA GTCTTTGGAC AACGCACCCA GCTATGGTGG GATTTGGCTC TGCAAGAGAC 2 400 

CTACACTGTC TTAAAAGAGA TTTATGATGT GCCAGACTCG CTCTTTCATA AGCGTATGGA 2460 

CTTTTTGAAT GAAGTCTTGG ATTTGAAGGA CTTTATCAAG GATCCCGTGC GGACTCTTTC 2520 

ACTGGGACAA CGGATGCGGG CGGATATTGC GGCCTCCTTG CTCCACAATC CCAAGGTTCT 2 5 80 

TTTTTTAGAT GAGCCGACCA TTGGTTTC-GA CGTTTCGGTT AAGGATAATA TTCGTCGGGC 2640 

AATTACTCAG ATCAATCAAG AGGAAGAAAC TACCATTCTT TTGACCACTC ACGATTTGAG 2 7 00 

TGATATTGAG CAACTTTGTG ATCGGATTTT CATGATTGAC AAGGGGCAAG AGATTTTTGA 2 7 60 

TGGAACGGTG AGCCAACTCA AGGAGACCTT TGGTAAGATG AAGACTCTCT CTTTTGAACT 2 82 0 

GCTACCAGGT CAAAGTCATC TCGTCTCTCA CTATGACGGT CTGTCTGATA TGACCATTGA 2 880 

TAGACAAGGA AACAGCCTCA ACATTGAATT TGATAGTTCT CGCTACCAGT CAGCTGACAT 2 940 

TATCAAGCAA ACCCTGTCTG ATTTTGAAAT CCGCGATTTG AAGATGGTGG AT ACG GAT AT 3000 

TGAGGATATT ATCCGTCGCT TCTACCGAAA GGAGCTCTAG GATGATCAAA TTGTGGAGAC 3 060 

GTTATAAACC CTTTATCAAT GCAGGGGTTC AGGAGTTGAT TACTTACCGA GTCAACTTTA 3120 

TTCTCTATCG GATTGGCGAT GTCATGGGGG CTTTTGTGGC CTTTTATCTC TGGAAGGCTG 3180 
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TCTTTGATTC TTCGCAAGAG TCTTTGATTC AGGGCTTCAG TATGGCGGAT ATCACCCTCT 3 2 40 

ACATCATCAT GAGTTTTGTG ACCAATCTTC TGACTAGATC CGATTCGTCC TTTATGATTG 3300 

GGGAGGAGGT CAAGGATGGC TCCATTATCA TGCGTTTGTT GCGACCAGTG CATTTTGCGG 3 3 60 

CCTCCTATCT TTTCACCGAG CTTGGTTCCA AGTGGTTGAT TTTTATCAGC GTTGGCCTTC 3420 

CATTTTTAAG TGTCATTGTC TTGATGAAAA TCATATCGGG TCAAGGTATT GTAGAGGTGC 3 4 80 

TAGGATTAAC TGTCATTTAT CTTTTTAGCT TAACGCTCGC CTATCTGATT AACTTTTTCT 3540 

TTAATATTTG CTTTGGATTT TCAGCCTTTG TGTTTAAAAA TCTTTGGGGT TCCAACCTAC 3600 

TTAAGACTTC CATAGTGGCT TTTATGTCGG GGAGTTTGAT TCCCTTGGCA TTTTTTCCAA 3 660 

AGGTTGTTTC AGATATTCTC TCCTTTTTGC CTTTTTCATC CTTGATTTAT ACTCCAGTTA 3720 

TGATCATTGT TGGAAAATAC GATGCCAGTC AGATTCTTCA GGCACTCCTT TTGCAGTTCT 3 7 80 

TCTGGCTCTT AGTGATGGTG GGATTGTCTC AGTTAATTTG GAAACGGGTC CAGTCCTTTA 3 840 

TCACCATTCA AGGAGGTTAG TATGAAAAAA TATCAACGAA TGCATCTGAT TTTTATCAGA 3900 

CAATACATCA AACAAATCAT GGAATATAAG GTAGATTTTG TGGTTGGTGT CTTGGGAGTC 3960 

TTTCTGACTC AAGGCTTGAA TCTCTTGTTT CTCAATGTCA TCTTTCAACA TATTCCATTC 402 0 

CTAGAAGGCT GGACCTTTCA AGAGATAGCT TTCATTTATG GATTTTCCTT GATTCCCAAG 4080 

GGAATGGACC ATCTCTTTTT TGACAATCTC TGGGCACTAG GGCAACGCCT AGTCCGAAAA 414 0 

GGGGAGTTTG ACAAGTATCT GACTCGTCCC ATCAATCCTC TCTTTCACAT CCTAGTTGAA 4200 

ACCTTTCAGA TTGATGCCTT GGGTGAACTC TTAGTCGGTG GTATTTTATT GGGAACAACA 42 60 

GTGACCAGCA TTGTTTGGAC TCTTCCAAAA TTCCTGCTTT TCCTAGTTTG TATTCCTTTT 4320 

GCGACCTTGA TTTATACTTC TCTTAAAATC GCAACAGCCA GTATCGCCTT TTGGACTAAG 4380 

CAGTCAGGCG CCATGATTTA CATCTTCTAT ATGTTCAATG ACTTTGCTAA GTATCCGATT 444 0 

TCTATTTACA ATTCTCTTCT TCGTTGGTTG ATTAGCTTTA TCGTGCCTTT CGCCTTTACA 4500 

GCCTACTATC CAGCTAGCTA TTTCTTACAG GAAAAGGATG TGTTCTTTAA CGTAGGAGGT 45 6 0 

TTGATGTTGA TTTCTCTGGT TTTCTTTGTT ATTTCCCTTA AACTTTGGGA TAAGGGCTTA 4 620 

GATTCCTACG AAAGTGCGGG TTCGTAAAAG CTAAAGTAAG AC TAAAAT C A AGAAAGAAAC 468 0 

TTATGATGTT TGTAATTGAA GAAGTCAAGG ATGAAAATCA AAAAAAGGCA GTTGTCGCTG 4740 

AGGTTTTGAA GGATTTGCCA GAATGGTTTG GAATCCCAGA AAGCACACAA GCCTATATAG 4800 

AAGGAACCAC GACACTGCAA GTTTGGACCG CCTATCAGGA GAGTGATTTG ACTAGATTTG 4860 

TAAGCTTATC CTATTCGAGT GAAGATTGTG CAGAGATTGA TTGTCTCGGC GTAAAAAAGC 492 0 

TTATCAAGGT AGAAAAATTG GGAGCCAATT GCTTGCTACT TTAGAGAGTG AAGCTCGTAA 4980 
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AAAAGTTGGT TATCTGCAGG TCAAAACAGT GGCAGAAGGT TCTAATAAAG ATTATGATCG 5040 

AACAAATGAC TTTTATCGAG GTCTTGGCTT TAAAAAGTTA GAGATTTTTC CTCAACTATG 5100 

GAATCCGCAA AATCCTTGTC AGATTTTGAT TAAAAAGCTT GAATAATATT ACTTGACATC 5150 

TATTCTCAGA GTGCTATACT GTAAGTGTAA TCGCCGATTT AGCTTAGTTG GTAGAGCAAG 5220 

GCACTCGTAA AGCCTAGGTT ATAGGTAGAT AAACGACTGA GGATTTGAAA AAATAGATAG 52 80 

GTAGAAGATA ACCGTTAAGC CTTACTCTTA GCGGTTATTT ATATTGTTTA ATAGCGCTAA 5340 

TATTTTATCA ATTATGCCTG TTTTCGTGTT TCTGGTAGTT GTTCAAGTTT ATTGCTACTA 54 00 

TTTTTGATGG TATGAATGTG CTTATAATGT ATCCCGGTTA ACGAAAGTTT TGGACTTATA 5460 

CTCTTCGAAA ATCTCTTCAA ACCACGTCAA CGTCGCCTTG CCGTGCGTAT GGTTATGACT 552 0 

TCGTCAGTTC TATCCACAAC CTCAAAACAG TGTTTTGAGT GACTACGTCA GTTCCATCTA 5 5 80 

CAACCTCAAA ACACTGTTTT GCCCAATCTG CGGCTAGTTT CCTAG 5 62 5 
(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7571 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

CTCTCCAGCT TTCCTTGCGA GTTGGCCATG TTGTGTCTTT AAGAAGTCTA AAAATATCTC 6 0 

CAATAAAACG CATCGCTCTC TCCTATCTCG TTTCTCTGTG TGTAGTGTAC TTGCCACAAT 12 0 

GCTTACAAAA TTTATTTACT TCTAGTCGTG TAGGCTTGAG GTTTCCGCTG ATCTTGATTG 180 

AATAGTTTCT CGAACCACAA ACCGCACAAG CTAGGCTTGC TTTTTTTAGT GCCATAACGC 240 

CTCCATCTTA TCCATTATAA CAAGAAAGCT AGGCTTTGAC AAGCATCTTA GCGAAATAGA 300 

TTGACTATCG AATCCCATAT TGTTTGAGCC TTTTCCTTAA TCTTCGCATC TGAGATAGCC 3 60 

CGGCTAGCCT CATCTACTAG ACTTTGCGCA CGCCCTCGAA TATCAGACAA ATTATCATCT 42 0 

GTCTGGCTAT TATCATTGGT TTGTACTTGT CTTTTTGTAT TGGCTGGTGC AATTCCATTT 480 

TGCTTATAAG CATTTTCAAC CGTAAAGGTA CTTCCTGGCG TATAAGGTAA AATGGTATTG 54 0 

GCAATGTTTC TAAAGACATG AGCTGCACCG TTTGAAGTAG AGCCAGCTAG ATAGTGGTTT 600 

TCATCAGTGG TCGGAAAGCC AAGCCAGTGG CTAATCACTA CATCCGGAGT ATAACCAATT 660 

ACCCACTGGT CACTTGTGTA CTCCGGATTG AAAACTGCTT CAGTTGTTCC AGTTTTCCCT 72 0 
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G C CAT G AC AT AGTCTGCAGG CGATGAACTA ATACCGGTAC CGTTGGTGAA AGTCCCCAAC 780 

ATCATACTGG TCATCTTGTC AGCTACAGAC TTATCAATCA CCCGTTTTTG TGAATTTTTA 840 

TGACTCGCAA TAACTTGTCC ACTAGCATTT TCAATTCTAC TAATAAAATG AGCTTCAGGC 900 

ATTAAACCTT CATTTGCAAA GGCGGCGTAT GCTTGAGCCA TTTGAAGAGG GTTGGTTTCA 960 

ACACCGCTTC CCAAGGCGAC ACCAAGAACA CGGTCGACCT TTTCCATGTT GAGTCCGAAT 102 0 

TTTTCGCCTG CCTCAAAAGC CTTGTCGACA CCCAAATCAT TAACAGTGGC AACAGCAGGT 1080 

AGATTAAGCG ATTCTGCCAA GGCTTGATAC ATAGGAACTT CTCGACTCGT TTTGATCCCT 1140 

GCATAGTTAT CAACCTTATA GCTGTCATAC TGCATGGTAT GGTTATCCAA CTGCTTATTC 12 00 

AAAGCCCAGC TTGCTTCAAC TGCTGGCGTA TAAACAACTA AAGGCTTAAT TGTAGAACCA 1260 

GGACTACGCT TTGATTGGGT TGCATAGTTG AAATTCCGGA ATCCAGTTTT ATCATTGTCA 13 2 0 

GCAACTTGAC CGACAACTCC ACGAACTCCC CCTGTTTTCG GTTCGAGGGC TACACTTCCT 1380 

GATTGAGCAA ACGTTCCATC CTCTGCCCTC GGAAATAGCG ATGTGTTTTC ATAAACAATC 1440 

TGCATATTTG CTTGGTAGTT TTGGTCCAGC TCTGTGTAAA TGCGGTAGCC ATTATTGACA 1500 

ATCTCTTCCT CTGTTAGATT ATACTTGGAA ACAGCTTCAT TAACCACCGC ATCAAAATAA 15 60 

GAGGGGTAAC GGTAATCTGA GATTTTTCCT TCATACTTAT CGTGCAATTG CGAAGTCATA 1620 

TCAACTTCAG CAGCTTTGGT TTCTTGGTTT TTATCAATAT ATCCTGCTGC AACCATATTC 1680 

TGCAAGACAG TATCGCGCCG ATTAGTAGAA TCTTCTACGG AATTCAAGGG ATTATACAGT 17 40 

TCCGGCCCCT TGAGCATCCC TGCCAGAGTC GCAGCTTGAT CCAGACTCAC TTCTGATGCA 1800 

GAAACTCCAA AGTATTTCTT ACTCGCATCT TCTACACCCC ACACACCATT TCCAAAATAA 1860 

GCGTTGTTAA GGTACATGGT TAGAATTTGC TCCTTACTAT ATTTTTTGCT TAATTCTAAG 192 0 

GCAAGGAAAA ATTCTTTCGC TTTTCTCTCA ACAGTTTGAT CCTGCGATAA ATAGGCGTTT 19 80 

TTAGCCAGCT GTTGGGTAAT GGTAGAGCCA CCACCTGAAC GTCCAGCAGT GACAATAGCC 2 04 0 

AAGAAAAAAC GGCCATAGTT AATCCCGTCA TTTTTATAGA AAGAACGGTC TTCTGTCGCA 2100 

ATAACAGCAT TCTGCAAGTT TTTACTGATG TCAGTCAGCT CAACATAGGT TCCCTTTTGA 2160 

CCAGACAAGG CACCAGCCTC TTTTTCTTCA CGGTCAAAAA TAAGAGTCCG AGTTTTCAAG 222 0 

GCATTTTGCA AATCATTGAC ATTGGTCGAC TTGGCTACAG CAAACAAATA GATTCCAACT 22 8 0 

AGCAAGCCTG CACTCAAACC TAGTATAAGG ATAATCTTTG TTAGATGATA ACGACGCCAG 2340 

AATTTTCGAA TCGGACCTAC TTGGGCTAAT TTTTTTCGAT CACTACGAGA GCGACGTAAG 24 00 

ATAGTAGAAT CAGAGTCCTC TAGTTCACTT GTTTCTTTTT TAAAAAGAGA AAGAAATTTC 24 60 

TCAAATAATT TATCTAATTT CATGCGTTTA TTTTATCATC TTCATCATAG GAAGACAAGA 2520 
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ATTTAGCTAT TTCCTATCCA AATAGGGCTT TTTTTGTTAC AATATCTGTA TGCAATTCAC 2580 

ATTTACATTA CCCGCCTCTC TACCTCAAAT GACAGTAAAG CAATTACTTG AGGAACAACT 2 640 

CCTCATCCCT AGAAAAATCC GTCATTTTTT GAGAATCAAG AAACATATTT TGATAAATCA 27 00 

AGAAGAAGTC CACTGGAAGG AAATCGTAAA TCCTGGAGAT GTTTGCCAGT TGACTTTTGA 2 7 60 

CGAGGAAGAT TATTCCCAAA AGACGATCCC TTGGGGCAAC CCAGACTTAG TGCAGGAAGT 282 0 

TTATCAAGAT CAACACTTGA TTATTGTAAA CAAACCAGAG GGGATGAAAA CGCATGGTAA 2 880 

TCAACCAAAC GAAATTGCCC TTCTTAACCA TGTCAGTACC TATGTTGGCC AAACCTGCTA 2 940 

TGTCGTTCAT CGTCTGGACA TGGAAACCAG TGGCTTAGTT CTCTTTGCCA AAAATCCTTT 3 000 

TATCCTGCCC ATTCTCAATC GCTTATTGGA GAAAAAAGAG ATTTCTAGAG AATATTGGGC 3 0 60 

TCTAGTTGAT GGAAATATCA ACAGAAAAGA ACTTGTTTTC AGAGACAAAA TTGGACGTGA 3120 

TCGCCATGAT CGTAGAAAAA GAATAGTTGA TGCAAAAAAT GGGCAATATG CTGAAACGCA 3180 

TGTAAGCAGA TTAAAGCAAT TCTCAAACAA GACTTCCTTG GCTCATTGCA AGCTAAAGAC 324 0 

AGGGCGAACC CATCAGATTC GTGTGCACCT TTCGCATCAT AATCTTCCTA TCCTGGGAGA 3300 

CCCTCTCTAT AATAGTAAAT CAAAGACAAG CCGGCTTATG CTTCATGCCT TCCGACTTTC 33 60 

CTTTACCCAC CCACTTACTT TAGAGAAGCT AACTTTCACT ACCCTTTCAA ATACATTTGA 3420 

AAAAGAATTA AAAAAGAATG GATGATCGTG TCATCCATTT TTCCATATAA AAAAGCAAGA 3480 

CCACAAAGCC TTGCTTTCTA TCAACTCAAG AATTATTTAG CAATTTTTGC GAAGTATTCA 354 0 

AGAGTACGAA CAAGTTGTGC AGTGTATGAC ATTTCGTTGT CGTACCATGA TACAACTTTA 3600 

ACCAATTGTT TACCGTCAAC GTCAAGAACT TTAGTTTGAG TTGCGTCAAA CAATGAACCG 36 60 

TAAGACATAC CTACGATATC TGAAGATACG ATTGGATCTT CTGTGTAACC GTATGATTCG 372 0 

TTTGAAGCTG CTTTCATAGC TGCGTTCACT TCATCAACAG TAACGTTCTT TTCAAGAACT 3780 

GCTACCAATT CAGTAACTGA TCCAGTTGGA GTTGGAACGC GTTGTGCAGA TCCGTCAAGT 3 84 0 

TTACCATTCA ATTCTGGGAT TACAAGACCG ATAGCTTTTG CAGCACCAGT TGAGTTAGGA 3900 

ACGATGTTTG CAGCACCAGC GCGAGCACGG CGAAGGTCAC CACCACGGTG TGGTCCGTCA 3960 

AGGATCATTT GGTCACCAGT GTAAGCGTGG ATAGTAGTCA TCAATCCTTC AACAACACCA 402 0 

AAGTTGTCTT GAAGAGCTTT AGCCATTGGA GCCAAGCAGT TTGTAGTACA TGAAGCACCT 4080 

GAGATAACTG TTTCAGTACC GTCAAGAACG TCGTGGTTAG TGTTGAATAC AACTGTTTTA 414 0 

ACGTCGTTTC CACCAGGAGC AGTGATAACA ACTTTTTTAG CTCCACCTTT AAGGTGTTTT 4200 

TCAGCTGCTT CTTTCTTAGC AAAGAAACCA GTAGCTTCAA GAACGATTTC TACACCGTCA 4260 
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GTAGCCCAGT CGATTTGTTC TGGATCACGT TCAGCAGAAA CTTTGATGAA TTTACCGTTA 432 0 

ACTTCAAATC CACCTTCTTT AACTTCAACA GTACCGTCGA AACGACCTTG AGTTGTGTCG 4380 

TATTTCAACA AGTGTGCAAG CATAACTGGA TCTGTAAGGT CGTTGATGCG TGTAACTTCA 4440 

ACACCTTCTA CGTTTTGGAT ACGACGGAAA GCAAGACGAC CGATACGTCC GAAACCGTTA 4500 

ATACCAACTT TAACTACCAT TAGTGATTTC CTCCTTATGA AAATCATGAA ATTTTTATTG 4 560 

TGAAAAGAGT AACTTGAATC ACTACAAATC ACCTTTCAAC AAACCTATTA TACAACTATT 4 620 

TGAGTTGAAT TGCAAGTATG GCCATTGTTT TTCTATGTTA GTTTCTTTTT AAGACTGTAA 4 680 

ACCAAGGAAT CCCTTACTAT TCATAGCATA ACGATTCTAT AGGATCCATT TTACTAATCT 4740 

TACGCGCCGG GAAGTAGGCT GAGACATAAC CAAGTAATAG AGCGAAAACT AGAGTTCCTA 4 800 

AAACAGATAA AAGATTTAAT TTAAAAACCT TAGTGATGGA TGGGTAAAAG TGACTTACAA 4 860 

TCGCATTCGC CAAACTTCCC ACCCCTTGTG CAACCAAAAA TGCCAGCAGC AAGGCGATGC 4 920 

CTACAATCCA GATAGCCTCG TAAATAAAAA TTCCTTTGAC ATCACGATTC TGATAACCAA 4 9 80 

CTGCTTTCAT GACACCTATT TCCTTGGAAC GTTGCATGAT ATTGATGTAA ATAATGATAC 5040 

CAATCATAAC CGCTGCTACC ACAATAGCTT GTGATGAAAG CACAATCAAT AATCCCTGAA 5100 

TAACACGAAT AAAGGTAATC ACAATATCAA GAACTCTCTG TTGAGAAAGC ACAGTATACT 5160 

TCTTATTTTT CTGTAATTCT TCTGTTACTA CTTTTGTCTG TGATGGATCT TTGAGTTCCA 52 20 

AGATAAAATA AGATACAGCT TTCGTAAATC CAGCCTCTTT CAAAATCGTT TCCATTTGAT 52 80 

GAGACAGCAT GAAACTGTTG CTGTCCTCCA TGTCATCTTC ATCATTGATT ACACGTACAA 53 4 0 

TCTTCGTTTG AAATTGAGCA ATCTTACTAG TTTCGGCAGC ACTTTCTACA ATGCTGGCTG 54 00 

AGACTGATTT GCCAATAAGA TCATTAGCTG TCAAATTTTT TCCTGTCTGT TCATTCCAAT 54 6 0 

TTTTTAGTAA ACTGCTTGGA ATCGTTAATC CCTGTTCATT TGTATCAGTA TAGAGGGATC 5520 

CAGCCAACAC TTTGTCCGTC TCATTATTAC TAACAGAGAT ACTTGTATCA TCATAAAGAC 5 5 80 

TCACTACTTG AGCATAAGAA GGCATCGTTT GACTCAGATC CATTTCTTGC CCATCTATAG 5 640 

TAATATTTGA CATGTTCATC CCAAAAGGAC TCTCCAAATA TTTAATAGCT TCTTTCCCAA 57 00 

CTGTATCCGT GATATATAGT CAATTGAAAC AAGAGCAGGA TAAAAAAGCC TCGTAAAAGG 57 6 0 

TATTGCAACT TGGTAATACC TTTTTGAGGT GCTTTTTGAT ATGAGCCCAT GTTTTCTCAA 5820 

TAGGATTGTA CTCAGGCGAG TAGGGAGGAA GAGGTAAAAG TTTATGCCCA AACTCTTCGC 5880 

ATAAAAGTTC TAGCTTCCCC ATTCTATGGA ATCTTACATT ATCCATAATA ATAACCGATG 5940 

GTGTGTTTAA TGTTGGTAAG AGAAAATTCT GAAACCAAGC TTCAAAAAAG TCGCTCGTCA 6000 

TCGTCTCTTC GTAAGTCATT GGAGCGATTA ATTCACCATT TGTTAGACCT GCAACCAAAG 6060 
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AAATCCTCTG ATATCTTCTT CCAGATACTT TGCCTCTTAT TAATTGACCT TTTAATGAGC 612 0 

GACCATATTC TCGATAAAAA TAAGTATCGA ATCCTGTTTC GTCAATCTAA ACAGGTGCTA 618 0 

GGTGCTTTAA ACTATTAAAA TTCTTAAGAA ATAAGGCTAC TTTTTCTGGG TCTTGTTCAT 624 0 

AGTAGGTGTG GTTCTTTTTT CGAGTGTAGC CCATAGCTTT GAGCGTATAG TGGATGGTAG 63 00 

TTGGATGACA GCCAAATTCA GAAGCTATTT CAGTCAAATA AGCGTCTGGA TTGTCAGTAA 63 6 0 

GATAGTTTTT AAGTCTATCT CTATCAACCT TTCTTGGTTT TATTCCTTTT ACTTGGTGGT 642 0 

TTAGCTCTCC TGTTTTCTCT TTTAGCTTTA ACCAGCCATA AATGGTATTA CGTGAGATTT 6480 

GGAAAACGTG TGATGCTTCT GTTATACTAC CTGTTCGCTC ACAATAAGAG AGAACTTTTT 654 0 

TACGAAAATC TATTGAATAT GCCATAAAAA GATTATACCA CATTGTGTAC TATTTTTGGT 6600 

TCATTTTACT ATATTTGAAG AGGCGTTTAA ACTATCTGAC ATAAAACTCG TTCTAGAGGA 6660 

AAGACATCCT TTAAAAAGTT AGTTTATTTT ACAACTTAGA CATCAAGGTA GGTTAACCCC 572 0 

TTCATGGAAA AATCAAGACT CTTAGCACTA TGGGTTAAAC TACCACTGGA GACGTAATCA 678 0 

ATCGCTAAAC CACGAAAACG GCTAATAGTG GTCATATCAA TATTTCCAGA ACATTCAATC 6840 

CGAGAACGTC CTGCAATTAG GGTAATGGCC TGTTCAATCT GTTCCAATGA CATATTATCC 6900 

AACATGATAA TATCAGCACC CGCCGCCGCA GCTTCTTCGG CAGCAGCAAG GCTTTCCACT 5960 

TCCACCTCGA CCATTTTCAC AAAAGGGGCA TAGGCACGCG CTTGAGCAAT TGCCTTTTGA 702 0 

ACACTACCTA CTGCCGCAAT GTGATTGTCT TTTAGCAGGA TAGCATCTGA TAAATTAAAG 7080 

CGATGATTAT AGCCACCGCC AACTCTCACG GCATATTTCT CAAAAAGACG TAAATTAGGA 7140 

GTAGTTTTTC GAGTATCAAA TACCTTAATG CAATCATCGC CTAAGGCTTC TACATAAGCA 7200 

GCTGTCATCG AAGCAATCCC TGATAAATGT TGTAAAAAAT TCAAGGCAAC GCGTTCACAT 7260 

GTTAAGAGAC TTCTCACCGA GCCTATGATT TCTAAAACCA AATCGCCACT AGTCAAACGA 7 32 0 

TCCCCATCCT TAAATTGATG AGGATTCTGG AAGGTCACCT CGGCATCAAA TAGGGTAAAA 7380 

ACCCTTTGAA AAACGGTTAG CCCCGCTAAA ACACCAGCTT CCTTGGCAAA AAGCGACACC 7440 

TTGGCTTGGC CATGATGATC AAAAATGGCA TTGGTACTGT AATCTTCGGA ATGAACATCT 7 500 

TCTCGCAAGG CTGCTTTCAA TGTATCATCT ATTTGAAAAG GGGTTAAATC AGTTGAAATG 7560 

ATTGACATCA C 7 571 
(2) INFORMATION FOR SEQ ID NO : 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26385 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 
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(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

TTTGCTAGTG GCTTAAATTC TTCAGGAAAA TCAGGCGTAT CTAAAAGTCG TGTCGTTTTT 6 0 

GTTTCATCTA TATAAAGACT TCCTGCTCCC CCTACAACTA GAAAACGTGT CTGTGTTCCA 12 0 

GCAAGAAGCT GATTAAATAG TTCGATTGAT TTGCTGTGGA GCGGTAGCGT ATCTGGTGTA 180 

TAAGCACCAA ACGCTGAAAT AACAGCATCA AATCCAGTAA GATCATCTTT TGTCAACTCA 2 40 

AATAAATCTT TTTTAATAAT AGACTCAGCT TGACTTTTGT TTTCAGAACG AACAATAGCC 3 00 

GTTACTTCAT GTCCTCGTTT GACTGCTTCT TCAACAATTG CTTTCCCCGC TTGTCCATTT 360 

GCTGCAATAA CTGCTAGTTT CATTTTTTAT ACCTCTCTTG TTGTAATTAT TTTAGTTACA 420 

GAAATTGTGA CACTCTTAAT AATCAATGTC AATAGTCTTG CTTAATTATT ATCAAAATAT 4 80 

TTCTACCAAG AAAACTAACC ATGATTCTAG TGAAAAAAAA TCTTCTTTGT CAACAAATTT 54 0 

ACTTTCTTGT TTTAAACATG CTATAATAAT CATAGCAAGA GATCTAAGTT GTCTGTTTTT 600 

TTAAAACGAG GTGATTATCA TGCGTAGATT CTATTCCCAT CTCCCCTACT ATCTGGTCAT 6 60 

ATTATTCTTT TATTGGCCAC TTTATGAGTT GTTCTTACTA GTTGTTTCTG ACCCCCTTAC 720 

ACTCAAGGGA CTCTATATAA ACAATCTTCT CTTCTTTACA CCTCTGGTAA TCTTGATTGT 780 

ATCGTTACTC TATAGCTACC GTTTCCGTTT CTCACTTTGA TGGTTAGTTG GTAACGGACT 840 

GCTCTTTTAC TTTACTATCA TAACCTTTGG TGAGTTTATA CTAATTTACT TGCTAATCTA 900 

TGAAACAGTT GCTCTGGTCG GCATGGATTC TGGTATTAGC ATCAAGCATA TTCTACAAAA 960 

AATGAAAAAC AAAAAACTTT CACAAAATCC TTGAAAAATC TCACAATCAT GCTATAATAA 102 0 

TCCATAGAGA CAAGTCACTT AGTCCCTTTC TACTAGAGAG TGCGTGGTTG CTGGAAACGC 108 0 

ATAGGAAGTC TAAACTGATA CTACTCTTGA GTTTTTTATG AAAACATAAA ACGGTGGCCA 1140 

CGTTAGAGCC GATCAGAGGT GTCCCTCTCT TTTGAGGTAC ATAAATGAAG GTGGAACCAC 1200 

GTTGCGACGT CCTTTCGAGG ATGTCGCATT TTTTTATTAG GATACTAATT ATGGAGTTGC 12 60 

AAGAATTAGT GGAGCGCAGT TGGGCAATCC GACAAGCTTA TCACGAACTG GAAGTTAAGC 1320 

ATCATGATTC CAAGTGGACG GTAGAAGAAG ACCTCTTGGC TTTATCTAAT GATATTGGAA 1380 

ATTTCCAACG ACTGGTGATG ACAAAGCAAG GACGCTACTA TGATGAAACA CCCTACACAC 144 0 

TGGAACAAAA ACTTTCAGAA AATATCTGGT GGCTATTAGA ACTTTCTCAA CGTTTGGATA 1500 

TAGACATTCT GACGGAAATG GAAAACTTCC TCTCTGATAA AGAAAAGCAA TTGAACGTTA 1560 

GGACTTGGAA GTAGTCTGCT GATAAAAAAT CAATGCTTAG AAACTATGAA ATAATAAAAA 1620 
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AGGAGAACAT CATGATTAAC ATTACTTTCC CAGATGGCGC TGTTCGTGAA TTCGAATCTG 1680 

GCGTAACAAC TTTTGAAATT GCCCAATCTA TCAGCAATTC CCTAGCTAAA AAAGCCTTGG 1740 

CTGGTAAATT CAACGGCAAA CTCATCGACA CTACTCGCGC TATCACTGAA GATGGAAGCA 1800 

TCGAAATTGT GACACCTGAT CACGAAGATG CCCTTCCAAT CTTGCGTCAC TCAGCAGCTC 1860 

ACTTGTTCGC CCAAGCAGCT CGTCGTCTTT TCCCAGACAT TCACTTGGGA GTTGGTCCAG 1920 

CCATCGAAGA TGGTTTCTAC TACGATACTG ACAACACAGC TGGTCAAATC TCTAACGAAG 1980 

ACCTTCCTCG TATCGAAGAA GAAATGCAAA AAATCGTCAA AG AAAAC T T C CCATCTATTC 2040 

GTGAAGAAGT GACTAAAGAC GAGGCACGTG AAATCTTL'AA AAATGACCCT TACAAGTTGG 2100 

AATTGATTGA AGAACACTCA GAAGACGAAG GCGGTTTGAC TATCTATCGT CAGGGTGAAT 2160 

ATGTAGACCT CTGCCGTGGA CCTCACGTTC CATCAACAGG TCGTATCCAA ATCTTCCACC 222 0 

TTCTCCATGT AGCTGGTGCG TACTGGCGTG GAAACAGCGA CAACGCTATG ATGCAACGTA 2 2 80 

TCTACGGTAC AGCTTGGTTT GACAAGAAAG ACTTGAAAAA CTACCTTCAA ATGCGTGAAG 2 340 

AAGCTAAGGA ACGTGACCAC CGTAAACTTG GTAAAGAGCT TGACCTCTTT ATGATTTCAC 24 00 

AAGAAGTGGG ACAAGGTTTG CCATTCTGGT TGCCAAATGG TGCGACTATC CGTCGTGAAT 24 60 

TGGAACGCTA CATCGTAAAC AAAGAGTTGG TTTCTGGCTA CCAACACGTC TACACTCCAC 252 0 

CACTTGCTTC TGTTGAGCTT TACAAGACTT CTGGTCACTG GGATCATTAC CAAGAAGACA 2580 

TGTTCCCAAC CATGGACATG GGTGACGGGG AAGAATTTGT CCTTCGTCCA ATGAACTGTC 2640 

CGCACCACAT CCAAGTTTTC AAACACCATG TTCACTCTTA CCGTGAATTG CCAATCCGTA 2700 

TCGCTGAAAT CGGTATGATG CACCGTTACG AAAAATCTGG TGCCCTCACT GGCCTTCAAC 27 60 

GTGTACGTGA AATGTCACTC AACGACGGTC ACCTATTCGT TACTCCAGAA CAAATCCAAG 282 0 

AAGAATTCCA ACGTGCCCTT CAGTTGATTA TCGATGTTTA TGAAGACTTC AACTTGACTG 2880 

ACTACCGCTT CCGCCTCTCT CTTCGTGACC CTCAAGATAC TCATAAGTAC TTTGATAACG 294 0 

ATGAGATGTG GGAAAATGCC CAAACCATGC TTCGTGCAGC TCTTGATGAA ATGGGCGTGG 3000 

ACTACTTTGA AGCCGAAGGT GAAGCAGCCT TCTACGGACC AAAATTGGAT ATCCAGATTA 3060 

AAACTGCCCT TGGAAAAGAA GAAACCCTTT CTACTATCCA ACTTGATTTC TTGTTGCCAG 312 0 

AACGCTTCGA CCTCAAATAC ATCGGAGCTG ATGGCGAAGA TCACCGTCCA GTCATGATCC 3180 

ACCGTGGGGT TATCTCAACT ATGGAACGCT TCACAGCTAT CTTGATTGAG AACTACAAGG 3240 

GGGCCTTCCC AACATGGCTG GCACCACACC AAGTAACCCT CATCCCAGTA TCTAACGAAA 3300 

AACACGTGGA CTACGCTTGG GAAGTGGCCA AGAAACTCCG TGACCGCGGT GTCCGTGCAG 3360 
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ACGTAGATGA GCGCAATGAA AAAATGCAGT TCAAGATCCG TGCTTCACAA ACCAGCAAGA 342 0 

TTCCTTACCA ATTAATTGTT GGAGACAAAG AAATGGAAGA CGAAACAGTC AACGTTCGTC 348 0 

GCTACGGCCA AAAAGAAACA CAAACTGTCT CAGTTGATAA TTTTGTTCAA GCTATCCTAG 3540 

CTGATATCGC CAACAAATCA CGCGTTGAGA AATAAGAGTC TAGCATAAAA GCCTCCAATC 3600 

TGGAGGCTTT TTCTCATCTA TTTTTACTCA AGGACTAAGT TCACTTGAGC AAACTGAATC 3660 

CGCACTGTCG TTCCTTTTCC GACCTCAGAC TCGATACGAA TCTGGTGCCC CAGTTCTTCA 3720 

GAAATTTTCT TAGATAGATA AAGGCCAAGT CCAGAGGACT GCTGGGTCAA ACGGCCATTG 3780 

TATCCTGAAA AGCCACGTTC AAATACTCGG AGGACATCAC TGTTTTTTAT CCCGATTCCC 384 0 

GTATCTTTGA TACAAAGCTC TTGGTCATCC ATATAAATCT CCAGACCACC TTCCTTGGTG 3900 

TACTTGAGAC TGTTTGAGAT GATTTGCTCA ATAACCACTA GCAGCCACTT TTTATCCGTC 3960 

ACGATTTCTT TATCAAGGTC ATGTAGATTG ACATTTAAGC CTTTTTGAAT AAAGAAAAGA 4 020 

GCATATTTAC GAATTATTTC CTTGACCAAG TCCTCAATTT GAACCTGCTT TAAGACCAAA 4080 

TCATCATGGA AACTTTCTAA ACGCAGGTAC TGTAAAACTA GGTTGGTATA GGAGTCGATT 4140 

TTGAAAATTT CCTGTTCTAG CTGCTGCTTC AGTTGGCGGT CGACCACTTC TGCAACTAAG 42 00 

AGTTGACTGG CTGCAATGGG GGTCTTTATC TGATGGACCC ACAAGGTATA GTAATCCAGC 42 60 

AAATCCGTCA GTTTTCTTTC TGCTTTTGAC CTCTGCTGAT AGAGTTCCAT CTCACGCGCT 4320 

TCTAATTTTT CTGCTAAAGC TATTTCCAAA GGAGACTTGG CTTCCCTCTC TCCATAGAGA 4380 

AGTTCCTGGC GATAGACCTG CGTTTCCACC AATATGTCCC AAGTGAAAAA TAATATGGTT 444 0 

ACAAAGCAAC ACAAGAAGAA AAAGTAGAGG AAGTAAATTC CTAGACTGGC AAATAAAAAC 4500 

TGAAAGAGTA AGACAAGAAA TGCCAAAGAA AGCAGATAGA TAAAAAGACG ACTACGGGAG 4560 

CGCAGATAGG CTAGAAAAAA TTGTTTCCAA TCAAGCATGC TTCAATCCGT ACCCTATTCC 4 620 

TTTCTTGGTC TCGATAAATC CTACCAATCC CTGCTCCTCC AACTTTTTAC GCAAACGAGC 4 680 

CACATTGACA GAGAGGGTAT TAT CAT C AAT GAAAAAGTCA CTGTTCCAAA GTTCCCGCAT 4740 

CAGGTCGTCA CGTGCTACGA TGTTGCCTGC ATGCTCAAAT AACACGCGTA AAATCTGGAA 4800 

TTCATTCTTG GTCAAATTCA AGACTTGCCC TTGATAATGT AAATCCATGG ATTTGGTATT 4860 

GAGGATAACA CCAGCATATT CCAGCAAACT CTCATCACGC CCAAACTCAT AGGAACGACG 4920 

CAACAAGCCC TGAACCTTAG CTAAAAGAAC CTGCTGGTCA AAAGGCTTGG TCACAAAGTC 4 980 

ATCCGCCCCC ATATTGATTG CCATGACAAT ATCCATAGCC TGGTCTCTCG AAGAAAGAAA 5040 

CATGATAGGT ACCTTGGAAA TCTTGCGGAT TTCCTGACAC CAGTGATAAC CATTAAACAA 5100 

GGGCAAACCA ATATCCATGA GGACCAGATG AGGTTCCGAC TGAACAAATA GACTCAAAAC 5160 
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TTCCATAAAG TCTTCTACCA GGACCACTTC AAATCCCCAT TCAGAGAGCA TTTTCCCAAT 5220 

CTGTTGACGA ATGACCTGAT CATCTTCTAT TAATAAAATC TTGTGCATGC GCTTCTCCTT 528 0 

TTCCATTATT ATAACAGATT TTTCCATGCT AGATGGTCTG AAACTGAATT TGAAATAGCC 5340 

TGTTTTTAGC CAGTACAAAC AGGCTATGCT ACTAGCTAAT TTGAGGGAAA TTTGCTAAGA 54 00 

TAAATAAAAA GAAAGGAGCT CTTATGGCCA ATATTTTTGA CTATCTGAAA GATGTCGCAT 54 60 

ATGATTCTTA TTACGACCTT CCCTTGAATG AGTTAGACAT TCTAACCTTA ATAGAAATCA 5520 

CCTACCTCTC CTTTGATAAT CTGGTCTCCA CACTTCCTCA ACGTCTTTTA GATCTAGCAC 55 8 0 

CTCAGGTTCC AAGAGATCCC ACCATGCTTA CTAGCAAAAA TCGCCTTCAA TTATTAGATG 5640 

AATTGGCTCA ACACAAGCGC TTCAAAAATT GCAAACTCTC CCATTTTATC AACGACATCG 57 00 

ACCCTGAACT GCAAAAGCAA TTTGCGGCTA TGACTTATCG TGTCAGCCTC GATACCTATC 57 60 

TGATTGTCTT TCGTGGGACA GATGACAGTA TCATTGGCTG GAAGGAAGAT TTCCACCTGA 582 0 

CCTATATGAA GGAAATTCCT GCTCAAAAGC ACGCCCTTCG CTATTTAAAG AACTTTTTTG 5880 

CCCATCATCC TAAGCAAAAG GTTATTCTAG CTGGGCATTC CAAGGGAGGA AATCTCGCTA 594 0 

TCTATGCTGC TAGCCAAATT GAGCAAAGTT TGCAAAATCA GATCACAGCA GTTTATACAT 600 0 

TTGATGCACC TGGTCTCCAT CAAGAATTGA CACAGACTGC GGGTTATCAA AGGATAATGG 6060 

ATAGAAGCAA GATATTCATT CCACAAGGTT CCATTATCGG TATGATGCTG GAAATTCCTG 512 0 

CTCACCAAAT CATCGTTCAG AGTACTGCCC TGGGTGGCAT CGCCCAGCAC GATACCTTTA 6180 

GTTGGCAGAT TGAGGACAAG CACTTCGTCC AACTGGATAA GACCAACAGT GATAGCCAGC 6240 

AAGTAGACAC AACCTTTAAA GAATGGGTGG CCACAGTCCC TGACGAAGAA CTTCAGCTCT 530 0 

ACTTCGACCT CTTCTTTGGC ACTATTCTTG ATGCTGGTAT TAGCTCTATC AATGACTTGG 6 360 

CTTCCTTAAA GGCGCTTGAA TACATTCATC ATCTCTTTGT CCAAGCTCAA TCCCTCACTC 6420 

CAGAAGAAAG AGAAACCTTG GGTCGCCTTA CCCAGTTATT GATTGATACT CGTTACCAGG 6480 

CATGGAAAAA TAGATAATAC TCTTGAAAAT TAAATGTATA CAAAACAAAA GACCTAGAAT 6 540 

ACATACTTTC ATGTGCATTC TAAGTCTTTT TAAATAGAAT CTAATAGTCA ATAAAAATCA 6600 

AAGAGCATTG AGAGATAATG GGGCTTGGAA CGTCCCTCTC GCTTCAACAA AATGACCCCA 6 660 

TTATAGATTA AAAAGATGCC ACT T AG AAAA AGCAAAAAAG GAAGTAAGAC AAAGGCAAAT 6720 

ATATAAAAAG CTAACTGAAC ATTCTCGTAT CCATTTTTAT AAAAAAGGTA GGATAGATAA 6780 

AAATAACTTG AAATGAGGGA TAATAAAAAT AATACTGGAT TCCACAAACT TCTATTATCC 6840 

TTCCAAAATG ACACTATAAA GGCTAATACA ATTCCTATAA CGAGATACAT TTCTTACTCC 6900 
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TTTAATAGCT ACATTTTATC ATAATTATCC AAAGAAAAAA GAGGGCATTT ATCCCTCTTA 6 9 60 

ATCCTTCATC TGACTCTCTG CATCGGCCAC GACTTTTTCT AGACTGGTTT GACCAAGTTC 7020 

TGCCTCCATA GTCAACTGAA TTCTCTCCAA TTTTTGATCC AAAACATCAT GAATATGAGC 7 080 

TCCTACAGGG CAATTTGGAT TCGGATTGTC ATGGAAACTG AAGAGTTGAC CTGTCTTACC 7140 

AAGACATTCG ACCGCCTGAT AAACATCTAA AAGACTAATA TCCTTAAGGT CCTTGACAAT 72 00 

CTCTGTTCCG CCCGTTCCAC GCGCTACTGA AATCAGCTCT GCCTTCTTCA ACTGGGACAA 72 60 

GATCTTTCTG ATAATGACAG GATTGACCCC GACACTAGCA GCCAGAAAAT CACTGGTCAC 7 320 

CTTGCTTTCC TTCCCCTCGA GGGCAATGAT TATCAGCATA . TGAGTCGCAA TGGTAAATCT 73 8 0 

ACTTGGAATT TGCATCCTCT TCTCCTTTTT ACGAGGCTAC CCTGCCTCTA CTCTTCTTTT 74 40 

TCTATTATTA TACCCTTTTT AGTTGTAATG TCAATCGTTA CCACTTTTCA ACCAGTCGTC 7 500 

TAACTCCCGA TCGCAGCCC'T CTTTCTGAGC CAATTCTCTC AAAAATTCCT GATGATGAGT 7560 

ATGGTGGATC CCATTGACCA GACTTTCATA GTAAACCTCA AAATAGGGAA GTCTCAGGTC 7 62 0 

TTTAGCCAGC TGCAATTCAG CTGCTACATC GTAGTCTACC CGTCGGAAGT CCATATCTAC 7 680 

CAGGCCTTTG TCATCAAACT CCAAAATCAT ATACTGGGCC CGCAAGTCCT TCCGTAGCTG 774 0 

AGCGTCCAAA AAGAAAGGTT GGCCAATCGA ACCCGGATTG ACAATCAATT GCCCACCAGT 7 8 00 

CCCGTAACGA AGCAACTGCT GGTGAATATG TCCATAAACA GCAATATCAC AGGGAGGATG 7860 

AGTCACCAAG CGGTCAAACT CCTCTTGTTT GCCACTATGA ATCAACTCTC GCCCCCAGTT 7920 

CTTATCAGGC AGATGATGGC TAATTCCCAC CGTCAAATCC CCAAACTGAC GATGAATTTG 79 80 

AAGAGGTTGA TTGTGGAGCA CTTCAATTTC TTCTAGGGAA ATTTCCTCTA AAACATACTG 8040 

GCACTGGCGC AAGAGATAGC GTTGACTGGG GCGAGTACTG TCCAATTCCT TACGGACACC 8100 

ATGCCAAAGA CTGTCTTCCC AGTTTCCCAA AACTCTAGCC GTAATCGGTA GTTGATCCAA 816 0 

CAAGTCCAAA ATCCTTCTAC GCCCTGTCCC TGGCATGAGA ATATCTCCCA AAAGCCAGTA 82 2 0 

TTCATCCACT CCTATCTGCC GAGCATCTGC CAAAACAGCC TCCAAGGCGG TGGTATTTCC 82 80 

ATGAATATCT GAAAGAAGAG CTATTTTCGT CATATCCATC TCCTCGTTTT TTCTCTTGCA 334 0 

ATAAGTATAA CATAAAAAGT CACAGCTAGA GAAATCTAGC TTTTTTTGAT ATACTAGATA 84 00 

AAGATATTAG ACAAGAGGAA ACGAATGACC CCAAACAAAG AAGACTATCT AAAATGTATT 84 6 0 

TATGAAATTG GCATAGACCT G C AT AAGATT ACCAACAAGG AAATTGCGGC TCGCATGCAA 852 0 

GTCTCTCCCC CTGCCGTAAC TGAAATGATC AAACGAATGA AAAGTGAAAA TCTCATCCTA 8580 

AAGGACAAGG AATGTGGCTA TCTACTGACT GACCTCGGTC TCAAACTGGT CTCTGAGCTC 8640 

TATCGTAAGC ACCGCTTGAT TGAAGTTTTT CTAGTTCATC ATTTAGACTA TACAAGTGAC 87 00 
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CAGATTCACG AGGAAGCTGA GGTCTTGGAA CACACTGTCT CTGACCTGTT CGTGGAAAGA 8760 

CTAGATAAAC TGCTAGGTTT CCCTAAAACC TGCCCCCACG GGGGAACTAT TCCTGCCAAG 8820 

GGAGAACTAC TCGTTGAAAT CAATAACCTC CCACTAGCTG ATATCAAGGA AGCTGGCGCC 8880 

TACCGCCTGA CTCGGGTGCA CGATAGTTTT GACATTCTCC ATTATCTGGA CAAGCACTCA 8940 

CTTCACATCG GTGACCAGCT CCAAGTCAAG CAGTTTGATG GCTTCAGCAA TACCTTCACT 9000 

ATCCTCAGTA ACGACGAGGA TTTACAAGTG AATATGGACA TTGCAAAACA ACTCTATGTC 9060 

GAGAAAATCA ACTAATTTCT CAAGTCCCCT ACCAACCCTG AAAGTTTTAT TTTGGCTCTT 9120 

TGTCAACTGT AGTGGGTTGA AGTCAGCTAA GCTCGAGAAA GGACAAATTT TGTCCTTTCT 9180 

TTTTTGATAT TCAGAGCGAT AAAAATCCGT TTTTTGAAGT TTTCAAAGTT CCGAAAACCA 92 40 

AAGGCATTGC GCTTGATAAG TTTGATGAGA TTATTGGTCG CTTCCAGTTT GGCATTAGAA 9300 

TAGTGTAGTT GAAGGGCGTT GACAATCTTT TCTTTATCTT TGAGGAAGGT TTTAAAGACA 93 60 

GTCTGAAAAA TAGGATGAAC CTGCTTTAGA TTGTCCTCAA TGAGTCCGAA AAATTTCTCC 942 0 

GGTTTCTTAT TCTGAAAGTG AAACAGCAAG AGTTGATAGA GCTGATAGTG GTGTTTCAAG 94 80 

TCTTGTGAAT AGCTCAAAAG CTTGTCTAAA ATCTCTTTAT TGGTTAAGTG CATACGAAAA 9540 

GTAGGACGAT AAAATCGCTT ATCACTCAGT TTACGGCTAT CCTGTTGTAT GAGCTTCCAG 9 600 

TAGCGCTTGA TAGCCTTGTA TTCATGGGAT TTTCGATCCA ATTGGTTCAT AATTTGAACA 9660 

CGCACACGAC TCATAGCACG GCTAAGATGT TGTACAATGT GAAAGCGATC CAACACGATT 972 0 

TTAGCATTCG GGAGTGAAAC AGTCTGGGAG ACTGTTTCAG CCTGAGCCTA GAAATTTGAA 9780 

AGCGAAGCTG TTTAGCCAAG TCATAGTAAG GACTAAACAT ATCCATCGTA ATGATTTTCA 9840 

CTTGACAACG AACGGCTCTA TCGTAGCGAA GAAAGTGATT TCGGATGACA GCTTGTGTTC 9900 

TGCCTTCAAG AACAGTGATA ATATTAAGAT TATCAAAATC TTGCGCAATG AAACTCATCT 99 60 

TTCCCTTAGT GAAGGCATAC TCATCCCAAG ACATAATCTT TGGAAGCCGA GAAAAATCAT 10 02 0 

GCTCAAAGTG AAAGTCATTG AGCTTGCGAA TGACAGTTGA AGTTGAAATG GCCAGCTGAT 10 080 

GGGCAATATC AGTCATAGAA ATTTTTTCAA TTAACTTTTG AGCAATyTTT TGGTTGATGA 1014 0 

TACGAGGGAT TTGGTGATTT TTCTTTACCA GGGGAGTCTC AGCAACCATC ATTTTTGAAC 102 00 

AGTGATAGCA CTTGAAACGA CGCTTTCTAA GGAGAATTCT AGAAGGCATA CCAGTCGTTT 102 60 

CAAGATAAGG AATTTTAGAA GGTTTTTGAA AGTCATATTT CTTCAATTGG TTTCCGCACT 103 2 0 

CAGGGCAAGA TGGGGCGTCG TAGTCCAGTT TGGCGATGAT TTCCTTGTGT GTATCCTTAT 103 80 

TGATGATGTC TAAAATCTGG ATATTAGGGT CTTTAATGTC TAGTAATTTT GTGATAAAAT 10440 
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ATGGGACTTT TTTTCTACAA TAAAATAGGC TCCATAATAT CTATAGTGGA TTTACCCACT 10560 

ACAAATATTA TAGAACCGTA AAAATAGAAG GAGATAGCAG GTTTTCAAGC CTGCTATCTT 10 62 0 

TTTTTGATGA CATTCAGGCT GATACGAAAT CATAAGAGGT CTGAAACTAC TTTCAGAGTA 10680 

GTCTGTTCTA TAAAATATAG TAGATTGAAA TAAGATGTGA ACAACTCTAT CAGGAAAGTC 10740 

AAATTAATTT ATAGAATTAT TTTAGCAGTC AAGGTGTACT GTTATAGATT CAATATATTA 10800 

TATGACTATT AACCTTGTCT TCTCCTAAAA TTGACTTTCT TGTTTTCTTA TCTTGTCCAC 108 60 

TCGAAACAAG TATTGTAAGA ATTTGATTAT TTTTGAAAGT ACTTTTAATA TACTTGATAT 1092 0 

AGTTAAAAAA GATTTGAAAC TAAATTCCAA ATTAGAAAAA GACTTGAAAT ACTAAAAAAA 10980 

AAAAAGTATA CTCTAATTGA AAACGGTAAC AAAACTAATT TAGAGAATGA AATATAGAGT 11040 

ATTTCTCTCT TAAAAGTTTT TGGTGAAACG AGATGTAGAA AGGAGATTTA GCCAAAGAGT 11100 

CTATTAGTGC TAGAATAATA GATTAGAATT ATTTTAGAAA AACGAAGTGA GCAGCTTATA 11160 

AATTCAAGTC CCCAAATAGA TTCATACTAG TATCTTTTGC AAAAAATAAA GGGCGACTTC 11220 

CTTCATGAAT ATCAATTTCA TCTATAAGGA AGGTAGCTAA TTGAACTAAC TTATTTATTC 112 80 

TGTTTGTCGC TAGAAAAATC AGACCTCCTT GTGAAGATTG AGGAGATACT TAATGAAAAT 1134 0 

CAAAGAAGAA ACTAGCAAGC TAGTAGCAGA TTGCCCAAAA CACCGCTTTG AGGTTGTAGA 11400 

TAAGACTGAC CTATATAATC CAAGGTGAAG CGACTGTGGT TTGAAGAGAT TTTCAAAGAG 114 60 

TATAGGCTAG AGAGTAGTGT TTTTATGTCC TTCTAGTAGA AAATGCTAGA CAGAAGAATG 11520 

GGGAACTTGG ATAGGAAAAA TAGATTGAGA AAGGAGGTTA GAAGAGATGA TTATTACAAA 11580 

AATTAGCCGT TTAGGAACTT ATGTGGGAGT AAATCCACAT TTTGCAACAT TAATAGATTT 11640 

TCTAGAAAAA ACAGGACTAG AAAATTTAAC AGAAGGTTCG ATTGCTATCG ATGGTAATCG 11700 

ATTGTTTGGG AATTGCTTTA CTTATCTAGC AGATGGTCAA GCAGGGGCTT TCTTTGAAAC 117 60 

CCACCAAAAA TATTTGGATA TTCATTTAGT TTTGGAAAAC GAAGAAGCCA TGGCTGTTAC 1182 0 

ATCGCCGGAA AATGTAAGCG TTACCCAAGA ATATGATGAA GAGAAAGATA TTGAATTATA 1188 0 

CACAGGGAAA GTGGAACAGT TGGTTCATTT GAGAGCTGGC GAATGCCTCA TCACTTTTCC 11940 

AGAAGATTTA CATCAACCCA AGGTTCGTAT AAATGATGAA CCTGTGAAAA AAGTTGTCTT 12000 

TAAAGTTGCG ATTTCTTAAT GTAGAAAGAG AAGAACGATG AAAAAAATGA GAAAGTTTTT 12060 

ATGTCTAGCT GGAATTGCGC TAGCGGCTGT TGCCTTGGTA GCTTGTTCAG GAAAAAAAGA 1212 0 

AGCTACAACT AGTACTGAAC CACCAACAGA ATTATCTGGT GAGATTACAA TGTGGCACTC 12180 

CTTTACTCAA GGACCCCGTT TAGAAAGTAT TCAAAAATCA GCAGATGCTT TCATGCAAAA 12240 
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GCATCCAAAA ACGAAAATCA AGATTGAAAC ATTTTCTTGG AATGACTTCT ATACTAAATG 12300 

GACTACAGGT TTAGCAAATG GAAATGTGCC AGATATCAGT ACAGCTCTTC CTAACCAAGT 123 60 

AATGGAAATG GTCAACTCAG ATGCTTTGGT TCCGCTAAAT GATTCTATCA AGCGTATTGG 1242 0 

ACAAGATAAA TTTAACGAAA CTGCCTTAAA TGAAGCAAAA ATCGGAGATG ATTACTACTC 12480 

TGTTCCTCTT TATTCACATG CACAAGTCAT GTGGGTTAGA ACAGATTTGT TAAAAGAACA 1254 0 

TAATATTGAG GTTCCTAAAA CTTGGGATCA ACTCTATGAA GCTTCTAAAA AATTGAAAGA 12600 

AGCTGGAGTT TATGGCTTGT CTGTTCCGTT TGGAACAAAT GACTTAATGG CAACACGTTT 12 660 

CTTGAACTTC TACGTACGTA rTGGTGGAGG AAGCCTCTTA ACAAAAGATC TTAAAGCAGA 12720 

CTTGACAAGC CAACTTGCTC AAGATGGTAT TAAATACTGG GTTAAATTGT ATAAAGAAAT 12780 

CTCACCTCAA GATTCTTTGA ACTTTAATGT CCTTCAACAA GCTACCTTGT TCTATCAAGG 1284 0 

AAAAACAGCA TTTGACTTTA ACTCTGGCTT CCATATCGGA GGAATTAATG CCAACAGTCC 12900 

TCAATTGATT GATTCGATTG ATGCTTATCC TATTCCAAAA ATCAAAGAGT CTGATAAAGA 129 60 

CCAAGGAATT GAAACCTCAA ACATTCCAAT GGTTGTTTGG AAAAATTCAA AACATCCAGA 13020 

AGTTGCTAAA GCATTCTTAG AAGCACTTTA TAATGAAGAA GACTACGTTA AATTCCTTGA 13080 

TTCAACTCCA GTAGGTATGT TGCCAACTAT TAAGGGGATT AGCGATTCTG CAGCCTATAA 1314 0 

AGAAAATGAA ACTCGTAAGA AATTTAAACA TGCTGAAGAA GTAATTACTG AAGCTGTTAA 13200 

AAAAGGTACT GCTATTGGTT ATGAAAATGG GCCAAGTGTA CAAGCTGGTA TGTTGACTAA 13260 

CCAACACATT ATTGAACAAA TGTTCCAAGA TATCATTACA AATGGAACAG ATCCTATGAA 13320 

AGCAGCAAAA GAAGCAGAAA AACAATTAAA TGATTTATTT GAGGCTGTTC AGTAGATGTA 13380 

AAAGACTAGA AAATAGGTGG GATAGTGAGC TGAAAAGCTC TAGCCCAATC TTGTAAAAGA 13440 

AGGGAGAAGG AGAATGGTTA AAGAACGTAA TTTAACTCGC TGGATATTTG TTTTGCCAGC 13 500 

TATGATTATC GTAGGATTAC TCTTTGTTTA TCCGTTTTTC TCGAGTATTT TTTATAGCTT 13 560 

TACCAATAAG CATTTGATTA TGCCTAATTA TAAATTTGTT GGTTTGGCTA ACTATAAAGC 13 620 

TGTGCTATCA GATCCCAACT TCTTTAATGC GTTCTTTAAT TCAATTAAGT GGACCGTTTT 13 680 

C T CAT T AGTT GGTCAAGTTT TAGTAGGGTT TGTATTGGCT TTAGCTCTTC ACAGAGTACG 13740 

CCACTTCAAG AAATTATATA GGACATTATT GATTGTTCCT TGGGCATTTC CTACCATCGT 13 800 

TATTGCCTTC TCTTGGCAGT GGATTCTAAA CGGGGTTTAT GGCTACTTAC CTAATCTAAT 13860 

CGTAAAATTA GGTTTAATGG AACATACACC TGCATTTTTG ACAGATAGTA CATGGGCATT 13 920 

CCTATGTTTG GTGTTTATCA ACATTTGGTT TGGAGCACCA ATGATTATGG TTAATGTGCT 13 980 



WO 98/18931 



PCT/US97/19588 



TTCAGCTTTG CAAACAGTAC CAGAAGAACA ATTTGAGGCT GCTAAGATAG ATGGTGCTTC 14040 

AAGTTGGCAG GTGTTCAAGT TTATCGTCTT TCCACATATT AAAGTGGTTG TAGGACTTCT 14100 

AGTTGTTTTG AGAACTGTAT GGATCTTTAA TAACTTTGAC ATTATCTACC TCATTACTGG 1416 0 

TGGTGGACCA GCCAATGCTA CAACGACGCT TCCAATTTTT GCTTACAACC TGGGCTGGGG 14220 

AACTAAATTG TTGGGTCGTG CTTCAGCAGT TACAGTACTG CTCTTTATCT TCTTGGTGGC 1428 0 

GATTTGCTTT ATCTACTTTG CTATCATCAG TAAGTGGGAA AAGGAGGGTA GAAAATAATG 1434 0 

AAGAAGAAAT CCAGTATTTA TTTAGATATT CTCTCACATG TACTTTTAGT TGGTGCGACC 14400 

ATCGTTGCAG TTTTCCCATT GGTATGGATT ATCATATCTT CTGTCAAAGG GAAAGGGGAA 1446 0 

TTAACTCAGT ATCCAACACG ATTTTGGCCT GAACAGTTTA CATTAGATTA TTTCACTCAT 14520 

GTTATCAACG ATTTGCACTT CATTGATAAC ATTCGAAACA GTTTAATCAT TGCCTTGGCT 14580 

ACAACCCTTA TTGCGATTAT TATTTCTGCT ATGGCAGCCT ATGGTATTGT TCGATTCTTT 14640 

CCTAAATTGG GAGCAATCAT GTCGAGACTA CTCGTCATTA CCTACATTTT CCCACCAATT 14700 

TTGTTAGCAA TTCCCTATTC AATTGCCATT GCTAAAGTTG GGTTAACAAA TAGTTTATTT 14 760 

GGCTTGATGA TGGTTTATCT ATCTTTTAGT GTTCCATATG CAGTTTGGCT CTTAGTTGGA 14 820 

TTTTTCCAAA CAGTTCCAAT TGGAATTGAA GAAGCGGCTA GAATTGATGG TGCAAATAAA 14 880 

TTTGTTACGT TTTATAAAGT TGTGCTACCG ATTGTAGCAC CAGGTATTGT AGCAACAGCT 14 940 

ATTTATACAT TTATCAATGC TTGGAATGAA TTCCTGTATG CCTTGATTTT GATTAACAAT 15000 

ACAGGAAAGA TGACAGTAGC AGTAGCCCTT CGTTCACTTA ATGGTTCAGA AATACTAGAC 15060 

TGGGGAGATA TGATGGCAGC GTCTGTTATT GTAGTTCTTC CATCAATTAT TTTCTTCTCT 15120 

ATCATCCAAA ATAAGATTGC AAGTGGATTA TCAGAAGGAT CTGTGAAGTA GACGAAAGAA 15180 

GGAAAAAAAT GAATAAAAGA GGTCTTTATT CAAAACTAGG AATTTCCGTT GTAGGCATTA 15240 

GTCTTTTAAT GGGAGTCCCC ACTTTGATTC ATGCGAATGA ATTAAACTAT GGTCAACTGT 15300 

CCATATCTCC TATTTTTCAA GGAGGTTCAT ATCAACTGAA CAATAAGAGT ATAGATATCA 153 60 

GCTCTTTGTT ATTAGATAAA TTGTCTGGAG AGAGTCAGAC AGTAGTAATG AAATTTAAAG 15420 

CAGATAAACC AAACTCTCTT CAAGCTTTGT TTGGCCTATC TAATAGTAAA GCAGGCTTTA 15480 

AAAATAATTA CTTTTCAATT TTCATGAGAG ATTCTGGTGA GATAGGTGTA GAAATAAGAG 15540 

ACGCCCAAAA GGGAATAAAT TATTTATTTT CCAGACCAGC TTCATTATGG GGAAAACATA 15600 

AAGGACAGGC AGTTGAAAAT ACACTAGTAT TTGTATCTGA TTCTAAAGAT AAAACATACA 15660 

CAATGTATGT TAATGGAATA GAAGTGTTCT CTGAAACAGT TGATACATTT TTGCCAATTT 15720 

CAAATATAAA TGGTATAGAT AAGGCAACAC TAGGAGCTGT TAATCGTGAA GGTAAGGAAC 15780 
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ATTACCTCGC AAAAGGAAGT ATTGATGAAA TCAGTCTATT TAACAAAGCA ATTAGTGATC 15840 

AGGAAGTTTC AACTATTCCC TTGTCAAATC CATTTCAGTT AATTTTCCAA TCAGGAGATT 15900 

CTACTCAAGC TAACTATTTT AGAATACCGA CACTATATAC ATTAAGTAGT GGAAGAGTTC 159 60 

TATCAAGTAT TGATGCACGT TATGGTGGGA CTCATGATTC TAAAAGTAAG ATTAATATTG 16020 

CCACTTCTTA TAGTGATGAT AATGGGAAAA CGTGGAGTGA GCCAATTTTT GCTATGAAGT 1608 0 

TTAATGACTA TGAGGAGCAG TTAGTTTACT GGCCACGAGA TAATAAATTA AAGAATAGTC 16140 

AAATTAGTGG AAGTGCTTCA TTCATAGATT CATCCATTGT TGAAGATAAA AAATCTGGGA 16200 

AAACGATATT ACTAGCTGAT GTTATGCCTG CGGGTATTGG AAATAATAAT GCAAATAAAG 16260 

CCGACTCAGG TTTTAAAGAA ATAAATGGTC ATTATTATTT AAAACTAAAG AAGAATGGAG 16320 

ATAACGATTT CCGTTATACA GTTAGAGAAA ATGGTGTCGT TTATAATGAA ACAACTAATA 16380 

AACCTACAAA TT AT ACT AT A AATGATAAGT ATGAAGTTTT GGAGGGAGGA AAGTCTTTAA 1644 0 

CAGTCGAACA ATATTCGGTT GATTTTGATA GTGGCTCTTT AAGAGAAAGG CATAATGGAA 16500 

AACAGGTTCC TATGAATGTT TTCTACAAAG ATTCGTTATT TAAAGTGACT CCTACTAATT 16560 

ATATAGCAAT GACAACTAGT CAGAATAGAG GAGAGAGTTG GGAACAATTT AAGTTGTTGC 16620 

CTCCGTTCTT AGGAGAAAAA CATAATGGAA CTTACTTATG TCCCGGACAA GGTTTAGCAT 16680 

TAAAATCAAG TAACAGATTG ATTTTTGCAA CATATACTAG TGGAGAACTA ACCTATCTCA 16740 

TTTCTGATGA TAGTGGTCAA ACATGGAAGA AATCCTCAGC TTCAATTCCG TTTAAAAATG 15800 

CAACAGCAGA AGCACAAATG GTTGAACTGA GAGATGGTGT GATTAGAACA TTCTTTAGAA 15860 

CCACTACAGG TAAGATAGCT TATATGACTA GTAGAGATTC TGGAGAAACA TGGTCGAAAG 16920 

TTTCGTATAT TGATGGAATC CAACAAACTT CATATGGCAC ACAAGTATCT GCAATTAAAT 16980 

ACTCTCAATT AATTGATGGA AAAGAAGCAG TCATTTTGAG TACACCAAAT TCTAGAAGTG 17040 

GCCGCAAGGG AGGCCAATTA GTTGTCGGTT TAGTCAATAA AGAAGATGAT AGTATTGATT 17100 

GGAAATACCA CTATGATATT GATTTGCCTT CGTATGGTTA TGCCTATTCT GCGATTACAG 17160 

AATTGCCAAA TCATCACATA GGTGTACTGT TTGAAAAATA TGATTCGTGG TCGAGAAATG 17220 

AATTGCATTT AAGCAATGTA GTTCAGTATA TAGATTTGGA AATTAATGAT TTAACAAAAT 17280 

AAAGGAGAAA AACATGGTTA AATACGGTGT TGTTGGAACA GGGTATTTTG GAGCTGAATT 17 340 

GGCTCGCTAC ATGCAAAAGA ATGATGGAGC AGAGATTACT CTTCTCTATG ATCCAGATAA 17400 

TGCAGAGGCG ATTGCAGAAG AATTGGGAGC AAAAGTAGCA AGTTCCTTAG ATGAGTTGGT 17460 

TTCTAGCGAT GAAGTAGATT GTGTTATCGT CGCAACTCCA AATAATCTTC ATAAGGAACC 17 52 0 
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GGTTATTAAG GCTGCACAGC ATGGTAAAAA TGTTTTCTGT GAAAAACCAA TTGCGCTTTC 175 8 0 

TTATCAAGAT TGTCGCGAGA TGGTAGATGC GTGTAAAGAA AACAATGTAA CCTTTATGGC 17 640 

AGGACATATT ATGAATTTCT TTAATGGTGT TCATCATGCA AAAGAACTCA TTAATCAAGG 17700 

AGTTATCGGA GACGTTCTAT ATTGTCATAC AGCTCGTAAT GGTTGGGAAG AACAACAACC 17760 

GTCAGTATCA TGGAAAAAAA TTCGTGAAAA ATCAGGTGGT CACTTGTATC ACCACATCCA 1782 0 

TGAATTGGAT TGCGTTCAAT TCCTTATGGG GGGCATGCCT GAAACTGTAA CCATGACAGG 17 88 0 

TGGAAATGTG GCCCATGAAG GTGAACATTT CGGTGATGAA GATGATATGA TTTTTGTCAA 17940 

TATGGAATTT TCTAATAAGC GTTTTGCCTT GTTAGAATGG GGTTCAGCTT ATCGTTGGGG 18000 

TGAACATTAT GTCTTAATCC AAGGAAGCAA AGGTGCCATC CGCTTAGACT TATTCAACTG 18060 

TAAAGGAACT CTTAAGCTAG ATGGGCAAGA AAGCTATTTC TTGATTCACG AATCGCAAGA 18120 

AGAAGATGAT GATCGGACTC GTATCTATCA TAGTACAGAG ATGGATGGAG CAATTGCTTA 18180 

TGGTAAACCA GGTAAACGTA CTCCATTATG GCTATCATCT GTCATTGATA AAGAAATGCG 18240 

CTATCTGCAT GAGATTATGG AAGGAGCTCC AGTATCAGAA GAATTTGCAA AACTTTTGAC 18300 

AGGTGAAGCT GCCCTAGAAG CAATTGCTAC TGCAGATGCT TGTACCCAGT CTATGTTTGA 18360 

AGATCGCAAA GTAAAATTGT CAGAAATTGT AAAATAAATT TTGGTATTCT CCTATTTATA 18420 

GGTCGACTTG CTCCTCTGAA AGTACTTTTA GAGGAGCTGT TTGACTTTGC TAGTTTTTGA 18480 

AACTGAAATC TATTATACTA CAAACTATTG AAAGCGTTTT AATTTTAAGG TATAATAATC 18 540 

TCATAGAAAT AAAGAAAAGG AGGAAAGAGG ATGCCACAGA TTAGCAAAGA AGCCTTGATT 18 600 

GAGCAAATCA AAGATGGAAT CATCGTTTCT TGTCAGGCTC TTCCTCATGA ACCGCTTTAT 18 660 

ACAGAAGCGG GAGGGGTGAT TCCCTTGCTG GTCAAAGCGG CTGAGCAAGG TGGAGCAGTC 18720 

GGTATCCGAG CAAACAGTGT TCGCGATATC AAGGAAATTA AGGAAGTCAC TAAACTTCCA 187 80 

ATCATTGGGA TTATCAAACG TGATTATCCA CCTCAGGAAC CCTTCATCAC GGCTACTATG 18840 

AAAGAAGTTG ATGAATTGGC AGAACTGGAC ATCGAGGTGA TTGCTCTGGA TTGTACCAAG 18900 

CGTGAACGCT ACGATGGTTT GGAAATTCAA GAGTTCATTC GTCAGGTTAA GGAGAAATAT 18960 

CCTAATCAGC TTTTGATGGC TGATACTAGT ATCTTCGAAG AAGGGCTAGC AGCTGTAGAA 19 020 

GCAGGAATTG ACTTTGTCGG AACAACCTTA TCAGGCTACA CATCCTACAG TCCAAAAGTA 19080 

GACGGTCCAG ATTTTGAATT GATTAAGAAA CTCTGTGATG CTGGTGTAGA TGTCATTGCA 19140 

GAAGGAAAAA TTCATACACC AGAACAAGCC AAACAAATCC TTGAATATGG AGTGCGAGGC 192 00 

ATCGTTGTTG GTGGCGCCAT TACTAGACCA AAAGAGATTA CAGAACGCTT CGTTGCTAGT 192 60 

CTTAAATAAG ATGTGAGGGG GAGTTTTATG TTTAAAGTTT TACAAAAAGT TGGAAAAGCT 1932 0 
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TTTATGTTAC CTATAGCTAT ACTTCCTGCA GCAGGTCTAC TTTTGGGGAT TGGTGGTGCA 193 80 

CTTTCAAACC CAACCACGAT AGCAACTTAT CCAATACTAG ACAATAGTAT TTTTCAATCA 19440 

ATATTCCAAG TAATGAGCTC TGCAGGAGAG GTTGTATTCA GTAATTTGTC ACTACTTCTC 19500 

TGTGTGGGAT TATGTATTGG CTTAGCGAAA CGAGATAAAG GAACCGCTGC GTTAGCAGGA 195 60 

GTAACTGGTT ACTTAGTTAT GACTGCAACG ATCAAAGCTT TGGTAAAACT TTTTATGGCA 19620 

GAAGGATCTG CAATTGATAC TGGAGTTATT GGAGCATTAG TTGTCGGAAT AGTTGCCGTA 19 680 

TATTTGCACA ACCGATATAA CAATATTCAA TTACCTTCCG CTTTAGGATT CTTTGGAGGT 1974 0 

TCACGCTTCG TTCCTATTGT TACATCGTTC TCTTCTATCT TGATTGGCTT TGTCTTCTTT 19800 

GTTATTTGGC CACCTTTCCA ACAACTTCTT GTTTCTACAG GTGGATATAT TTCTCAGGCG 19860 

GGTCCAATTG GAACTTTTCT ATATGGATTT TTAATGAGAC TTTCTGGAGC AGTAGGCTTA 19920 

CATCATATAA TTTACCCTAT GTTTTGGTAT ACTGAACTTG GTGGTGTTGA AACTGTTGCA 19980 

GGACAAACAG TGGTTGGAGC TCAAAAAATA TTTTTTGCTC AATTAGCCGA TTTGGCCCAT 2 0040 

TCTGGATTAT TTACAGAAGG AACAAGGTTT TTTGCAGGTC GTTTCTCAAC AATGATGTTC 2 0100 

GGTTTACCGG CTGCCTGTTT AGCGATGTAC CATAGTGTTC CTAAAAATCG TCGTAAAAAA 2 0160 

TACGCGGGTT TGTTTTTTGG AGTTGCTTTA ACATCTTTTA TTACCGGTAT TACAGAACCA 20220 

ATTGAATTTA TGTTTCTATT CGTCAGTCCG GTTCTATATG TTGTTCACGC ATTCCTTGAT 202 80 

GGTGTTAGCT TCTTTATTGC AGACGTCTTA AATATTTCAA TAGGAAACAC ATTTTCAGGA 2 0340 

GGTGTAATCG ATTTCACTTT ATTTGGAATT TTGCAGGGGA ACGCTAAGAC GAATTGGGTT 2 0400 

CTTCAGATTC CATTTGGACT TATTTGGAGT GTTTTGTATT ATATTATTTT TAGATGGTTC 2 04 60 

ATTACTCAAT TCAACGTTCT AACGCCAGGG CGAGGAGAAG AAGTAGATTC TAAAGAAATT 2 0 520 

TCTGAATCCG CAGATTCAAC TTCAAATACT GCAGATTATT TAAAACAGGA TAGCCTACAA 2 05 80 

ATTATCAGAG CCTTGGGTGG ATCAAATAAT ATAGAAGATG TAGATGCTTG TGTGACACGT 2 0 640 

TTACGTGTAG CTGTAAAAGA AGTTAATCAA GTTGATAAAG CACTTTTAAA ACAAATTGGT 207 00 

GCAGTTGATG TCTTAGAAGT GAAGGGTGGC ATTCAAGCAA TCTATGGAGC AAAAGCAATC 207 60 

TTATATAAAA ATAGTATTAA TGAAATTTTA GGTGTAGATG ATTAAGTACT TACTGACTTA 20820 

ATAAAAAACA GAGGAGAGTG ATGGATGAGT AGGATGAAAT GAAATCGCAT ACAAGAAATA 20880 

AAGAACTCAT TATCCAAGTT GGATACGCTT ATTACATAGG AGAATACAAA TGAAATTTAG 20940 

AAAATTAGCT TGTACAGTAC TTGCGGGTGC TGCGGTTCTT GGTCTTGCTG CTTGTGGCAA 21000 

TTCTGGCGGA AGTAAAGATG CTGCCAAATC AGGTGGTGAC GGTGCCAAAA CAGAAATCAC 21060 
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TTGGTGGGCA TTCCCAGTAT TTACCCAAGA AAAAACTGGT GACGGTGTTG GAACTTATGA 2112 0 

AAAAT CAAT C ATCGAAGCGT TTGAAAAAGC AAACCCAGAT ATAAAAGTGA AATTGGAAAC 2118 0 

CATCGACTTC AAGTCAGGTC CTGAAAAAAT CACAACAGCC ATCGAAGCAG GAACAGCTCC 21240 

AGACGTACTC TTTGATGCAC CAGGACGTAT CATCCAATAC GGTAAAAACG GTAAATTGGC 213 00 

TGAGTTGAAT GACCTCTTCA CAGATGAATT TGTTAAAGAT GTCAACAATG AAAACATCGT 213 60 

ACAAGCAAGT AAAGCTGGAG ACAAGGCTTA TATGTATCCG ATTAGTTCTG CCCCATTCTA 2142 0 

CATGGCAATG AACAAGAAAA TGTTAGAAGA TGCTGGAGTA GCAAACCTTG TAAAAGAAGG 21480 

TTGGACAACT GATGATTTTG AAAAAGTATT GAAAGCACTT AAAGACAAGG GTTACACACC 21540 

AGGTTCATTG TTCAGTTCTG GTCAAGGGGG AGACCAAGGA ACACGTGCCT TTATCTCTAA 21600 

CCTTTATAGC GGTTCTGTAA CAGATGAAAA AGTTAGCAAA TATACAACTG ATGATCCTAA 21660 

ATTCGTCAAA GGTCTTGAAA AAGCAACTAG CTGGATTAAA GACAATTTGA TCAATAATGG 2172 0 

TTCACAATTT GACGGTGGGG CAGATATCCA AAACTTTGCC AACGGTCAAA CATCTTACAC 217 80 

AATCCTTTGG GCACCAGCTC AAAATGGTAT CCAAGCTAAA CTTTTAGAAG CAAGTAAGGT 2184 0 

AGAAGTGGTA GAAGTACCAT TCCCATCAGA CGAAGGTAAG CCAGCTCTTG AGTACCTTGT 21900 

AAACGGGTTT GCAGTATTCA ACAATAAAGA CGACAAGAAA GTCGCTGCAT CTAAGAAATT 21960 

CATCCAGTTT ATCGCAGATG ACAAGGAGTG GGGACCTAAA GACGTAGTTC GTACAGGTGC 22020 

TTTCCCAGTC CGTACTTCAT TTGGAAAACT TTATGAAGAC AAACGCATGG AAACAATCAG 22080 

CGGCTGGACT CAAT AC TACT CACCATACTA CAACACTATT GATGGATTTG CTGAAATGAG 22140 

AACACTTTGG TTCCCAATGT TGCAATCTGT ATCAAATGGT GACGAAAAAC CAGCAGATGC 22200 

TTTGAAAGCC TTCACTGAAA AAGCGAACGA AACAATCAAA AAAGCTATGA AACAATAGTC 2 2 260 

CTTAGTTATT CTATAAAAAG TAGTTTTTTA AAGAACCTAA GAGTGTATAC CCCCTTTTCC 22 320 

CTCTACACAG ATAGTGTAAG AAAAGGGGGC TTTTGTTTAA AATGTAAGAA ACTGTCACGA 22380 

AATTAAAATG AAGTTCTTAC ATAAGCGAAT CATAAAAAAT TTCATTTTGA TTTTAAAACA 2 2 440 

GTTCAAGAAA GTCAAAAAAT TATTCTATTT GAAAGAGAGG TGCCGACTGT GAAAGTCAAT 2 2 500 

AAAATCCGTA TGCGGGAAAC AGTGATTTCC TACGCTTTCC TAGCACCAGT ATTATTCTTC 22560 

TTTGTCATCT TTGTGTTGGC TCCGATGGTG ATGGGCTTCA TTACAAGTTT CTTTAACTAC 22 620 

TCAATGACTA AATTTGAGTT TGTAGGCTTG GATAACTATA TCCGTATGTT TAAAGATCCT 22 680 

GTCTTTACAA AATCTCTGAT TAACACAGTT ATTTTGGTTA TTGGATCTGT ACCAGTTGTT 22 740 

GTTCTATTCT CACTCTTTGT AGCATCTCAG ACCTATCATC AAAATGTCAT TGCCAGATCC 22 800 

TTCTACCGTT TCGTCTTCTT CCTTCCTGTT GTAACGGGTA GTGTTGCCGT GACAGTTGTT 22860 
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TGGAAATGGA TTTATGACCC ACTATCAGGG ATTCTAAACT TTGTCCTTAA GTCCAGCCAC 2 2920 

ATCATCAGCC AAAACATTTC TTGGTTGGGA GATAAAAACT GGGCATTGAT GGCGATTATG 22 9 80 

ATTATTCTCT TGACCACTTC AGTTGGTCAG CCCATCATCC TTTATATCGC TGCCATGGGG 23040 

AATATTGACA ATTCACTGGT TGAAGCGGCG CGTGTTGATG GTGCAACTGA GTTTCAAGTT 23100 

TTTTGGAAGA TTAAATGGCC AAGCCTTCTT CCAACAACTC TTTATATTGC AATCATCACA 23160 

ACAATTAACT CATTCCAGTG TTTCGCCTTG ATTCAGCTTT TGACATCTGG TGGTCCAAAC 23220 

TACTCAACAA GTACCTTGAT GTACTACCTT TACGAAAAAG CCTTCCAATT GACAGAATAC 232 80 

GGCTATGCCA ACACAATTGG TGTCTTCTTG GCAGTCATGA TTGCTATCGT AAGCTTTGTT 23 340 

CAATTTAAAG TACTTGGAAA CGACGTAGAA TACTAAAGAA AGGAGACAGC TATGCAATCT 234 00 

ACAGAAAAAA AACCATTAAC AGCCTTTACT GTTATTTCAA CAATCATTTT GCTCTTGTTG 23460 

ACTGTGCTGT TCATCTTTCC ATTCTACTGG ATTTTGACAG GGGCATTCAA ATCACAACCT 23520 

GATACAATTG TTATTCCTCC TCAGTGGTTC CCTAAAATGC CAACCATGGA AAACTTCCAA 23 5 80 

CAACTCATGG TGCAGAACCC TGCCTTGCAA TGGATGTGGA ACTCAGTATT TATCTCATTG 23640 

GTAACCATGT TCTTAGTTTG TGCAACCTCA TCTCTAGCAG GTTATGTATT GGCTAAAAAA 2 37 00 

CGTTTCTATG GTCAACGCAT TCTATTTGCT ATCTTTATCG CTGCTATGGC GCTTCCAAAA 237 60 

CAAGTTGTCC TTGTACCATT GGTACGTATC GTCAACTTCA TGGGAATCCA TGATACTCTC 23820 

TGGGCAGTTA TCTTGCCTTT GATTGGATGG CCATTCGGTG TCTTCCTCAT GAAACAGTTC 23880 

AGTGAAAATA TCCCTACAGA GTTGCTTGAA TCAGCTAAAA TCGACGGTTG TGGTGAGATT 23940 

CGTACCTTCT GGAGTGTAGC CTTCCCGATT GTGAAACCAG GGTTTGCAGC CCTTGCAATC 24000 

TTTACCTTCA TCAATACTTG GAATGACTAC TTCATGCAAT TGGTAATGTT GACTTCACGT 2 4060 

AACAATTTGA CCATCTCACT TGGGGTTGCG ACCATGCAGG CTGAAATGGC AACCAACTAT 24120 

GGTTTGATTA TGGCAGGAGC TGCCCTTGCT GCTGTTCCAA TCGTCACAGT CTTCCTAGTC 2 4180 

TTCCAAAAAT CCTTCACACA GGGTATTACT ATGGGAGCGG TCAAAGGATA ATACTCTGCG 2424 0 

AAAATCTCTT CAAACTACGT CAGCTTCACC TTGCCATACT TAAGTATTGC CTGCGGTTAG 24300 

CTTCCTAGTT TGTTCTTCAA TTTTCATTGA GTATAGGAAA ATCAATCTAT CAAGATACAG 24360 

AAGTATATTT TATAGATTTA GAGAATATAG AGGTTATAAG TGTCTACAAA ATGGAGGGTA 24420 

TGCAGTTACT TTATGAAGTT TTGTCAGACA CTTATAAACT TAAGAATGGT TTTAGTTAAC 2 4480 

TATCAGAAAC GAAGGAAAGA GTATGATTTT TGACGATTTG AAAAACATCA CCTTTTACAA 24540 

AGGGATTCAT CCTAATTTAG ACAAGGCTAT CGACTATCTC TACCAACATC GTAAGGATTC 24 600 
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TTTCGAATTA GGAAAGTATG ATATTGATGG AGATAAAGTC TTTCTAGTTG TTCAGGAAAA 246 60 

TGTCCTCAAT CAAGCTGAAA ATGATCAATT TGACTATCAT AAGAAC T AT G CAGATTTGCA 2 4720 

TTTGCTGGTA GAAGGACATG AATATTCGAG CTACGGTTCA CGTATCAAAG ACGAGGCAGT 24780 

AGCATTCGAC GAAGCGAGTG ACATTGGCTT TGTTCATTGT CATGAACACT ACCCACTCTT 24840 

GTTGGGTTAT CACAATTTTG CGATTTTCTT CCCAGGTGAG CCACATCAGC CAAATGGTTA 2 4900 

TGCAGGCATG GAAGAAAAGG TTCGAAAATA TCTCTTTAAA ATTTTGATTG ATTAAAAATA 24960 

GGATGAATTG TTTTTTTGTA AAGCTTTGAT AATACTCTAC CATGAAATTG ATCTTTGTGA 2 5020 

GGTAGAGAAA TGAGAATAAA ATATTTAAAA ATTGGTATCT TCTAAGTATG CTGCAAGAGC 25080 

TAGTTTCTTA GATGGACAGG GGATTACAGT TGATGAGATG GCTTGGATAA TTAGGGGCAT 25140 

TGTGAATGCA TTGATTGGTA GATACATAAA ATTAGGTACT TATGCGGCTA AGTATGGTAT 2 5200 

TAGTATGGCA CGCTCGATCT TAAGTAGGGT AGCTGCAACT GCAGCAGCAA GAGTAGGATT 2526 0 

ACTGACCAAG ATTTCTGGAT GGATTTTACG AGTAGCTGTG AATGTAGCTG ATGTATATGG 2 5320 

TAATTTTGCC AACAATATTG CTGCAGCTTG GGATGCATAT GATAAAATTC CTAACAATGG 2 5380 

TCGTATAAAC TTTTAAAATG CGAGAATGAA AGCACTTTGT ATTTTTTTAT TGAATATGTT 25440 

AGCTTGGACA GTGCTTGCAA TGATAATTCG TGGAGGGCTA GATGGATTTG ATAGGCATAC 2 5500 

TTGGAGTACT ATTTTAATTG CGTCGCTGTT CGGGGTATAT GATTATAAGC CCATAGATAA 25560 

AAATAGAAAA AAGTCCAAAA GAAAAAATAG ATTTGTTCAT GGTAGGGACT TATGAAAGCT 25620 

TTACTGACAA AAAAGAAAAC AGTTTACAAA GAAAAATGAT GGAGGAGCAA ACATGGCACA 2 5 680 

AAAAGGAGTA AGCCTTATCA AGGCAGCATT TGATACAGAT AACTTTCTCA TGCGTTTTAG 2 5740 

TGAGAAGGTC TTGGACATCG TGACAGCCAA TCTTCTTTTT GTCGTCTCTT GTTTACCCAT 2 5800 

CGTGACGATT GGAGTGGCTA AAATCAGCCT CTACGAGACC ATGTTCGAAG TTAAGAAGAG 2 5860 

CAGACGGGTG CCTGTTTTTA AAATCTATCT AAGATCTTTC AAGCAAAATC TGAAACTAGG 2 5 920 

TCTTCAGCTG GGTTTAATGG AGTTAGGAAT TGTGTTTCTT ACCCTTTCAG ATCTCTATCT 2 5 980 

TTTCTGGGGT CAAACAGCTC TGCCCTTCCA ATTGCTGAAA GCCATTTGTT TAGGTATTCT 2 6040 

GATTTTTCTT ACTATCGTGA TGCTGGCTAG TTACCCTATC GCGGCACGTT ATGACCTATC 2 6100 

TTGGAAAGAA ATTCTTCAAA AAGGATTGAT GTTGGCTAGT TTTAACTTTC CTTGGTTCTT 2 6160 

CCTCATGTTA GCCATTCTTG TCCTCATTGT GATGGTTCTT TATCTGTCCG CCTTCAGTCT 2 6220 

ACTCTTAGGT GGCTCAGTCT TCCTACTTTT TGGGTTTGGA CTATTGGTCT TTATCCAGAC 2 62 80 

TGGATTGATG GAGAAAATTT TCGCAAAATA CCAATAGGAG CTTTATTTCT GAAACTACTT 2 6340 

TCAAAGGCTC CAAACGCTAT TCTATAAGCG AGAAACTAAA ATCGG 2 63 85 
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(2) INFORMATION FOR SEQ ID NO : 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2716 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY : linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

CCTGCCCGCA TTGCCCTAGG CATTAAGTAA ACATATAAAA GCATGTGAGA GACTGTTGGA 60 

AAAGCGAGGA AATTTCCCCT CTTTTCCTCT AGTCTCTCCT TTCTTTTGCT GATTTTATTC 120 

AAAGAAAATG ATATAATAGT AGTTATGGAG AAAAAGAAAT TACGCATCAA TATGTTGAGT 180 

TCAAGTGAGA AAGTAGCAGG ACAGGGAGTT TCAGGTGCTT ACCGTGAATT AGTTCGTCTT 240 

CTTCACCGTG CTGCCAAGGA CCAATTGATT GTTACAGAAA ATCTTCCAAT CGAGGCAGAT 300 

GTGACTCACT TTCATACGAT TGATTTTCCC TATTATTTAT CAACCTTCCA AAAGAAACGC 3 60 

TCAGGGAGAA AGATTGGCTA TGTGCATTTC TTGCCAGCTA CACTTGAGGG AAGTTTGAAA 420 

ATTCCATTTT TCTTAAAGGG AATTGTGAAA CGCTATGTAT TTTCTTTTTA CAACCGGATG 4 80 

GAGCACTTGG TTGTGGTCAA TCCTATGTTT ATTGAGGATT TGGTAGCAGC TGGTATTCCA 540 

CGTGAAAAAG TGACCTATAT TCCTAACTTT GTCAACAAGG AAAAATGGCA TCCTCTACCA 600 

CAAGAAGAGG TAGTCAGACT GCGCACAGAT CTTGGTCTTA GTGACAATCA GTTTATCGTA 6 60 

GTAGGTGCTG GGCAAGTTCA GAAACGTAAA GGGATTGATG ACTTTATCCG TCTGGCTGAG 720 

GAATTGCCTC AGATTACCTT TATCTGGGCT GCTCCCTTCT CTTTTGGTGG TATGACAGAT 7 80 

GGTTATGAAC ACTATAAGAA AATTATGGAA AATCCCCCTA AAAATTTGAT TTTTCCAGGC 840 

ATTGTATCGC CAGAGCGGAT GCGCGAATTG TATGCTCTAG CGGATCTTTT CTTGTTGCCT 9 00 

AGTTACAATG AGCTCTTTCC TATGACTATT TTAGAAGCTG CGAGTTGTGA GGCTCCTATT 960 

ATGTTGCGTG ATTTAGATCT CTATAAGGTG ATTTTGGAGG GAAATTATCG GGCGACAGCG 102 0 

GGTAGAGAAG AGATGAAAGA GGCTATTTTG GAATATCAAG CAAATCCTGC TGTCTTAAAA 1080 

GATCTCAAAG AAAAGGCTAA GAATATTTCC AGAGAGTATT CTGAAGAGCA TCTGTTACAA 114 0 

ATCTGGTTGG ACTTTTATGA GAAACAAGCC GCTTTAGGGA GAAAGTAAAA AGTGAGGTAA 1200 

TCTATGCGAA TTGGTTTATT TACAGATACC TATTTTCCTC AGGTTTCTGG TGTTGCGACC 12 60 

AGTATTCGAA CCTTGAAAAC AGAACTTGAA AAGCAGGGAC ATGCTGTTTT TATCTTTACG 1320 

ACGACAGATA AGGATGTCAA TCGCTACGAA GATTGGCAAA TTATCCGCAT TCCAAGTGTT 13 8 0 
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CCTTTCTTTG CTTTTAAGGA TCGTCGCTTT GCCTACCGAG GTTTTAGCAA GGCACTTGAA 1440 

ATTGCTAAAC AGTATCAGCT AGATATTATC CATACTCAGA CAGAATTTTC TCTTGGCCTG 1500 

TTGGGGATTT GGATTGCGCG TGAATTGAAA ATTCCAGTCA TCCATACCTA TCACACCCAG 15 60 

TATGAAGACT ATGTCCATTA TATTGCTAAG GGGATGTTGA TCCGGCCGAC TATGGTCAAG 162 0 

TATCTGGTTA GAGGTTTCCT GCATGATGTG GATGGGGTTA TTTGCCCTAG TGAGATTGTC 16 8 0 

CGTGACTTGC TATCTGATTA TAAGGTCAAG GTTGAAAAAC GGGTCATTCC TACTGGGATT 1740 

GAATTAGCCA AGTTTGAGCG TCCGGAAATC AAGCAGGAAA ATTTGAAAGA ACTGCGTAGT 1800 

AAACTAGGGA TTCAAGATGG TGAAAAGACG TTGCTTAGTC TTTCGAGAAT CTCCTATGAA 1860 

AAAAATATTC AAGCAGTTTT AGCAGCCTTT GCTGATGTTC TGAAAGAGGA AGACAAGGTT 192 0 

AAACTGGTAG TAGCTGGGGA TGGCCCTTAT CTGAATGACC TCAAAGAGCA AGCCCAGAAC 19 8 0 

CTAGAGATTC AAGACTCAGT CATCTTTACA GGGATGATTG CTCCTAGTGA GACGGCTCTT 204 0 

TACTATAAAG CGGCGGATTT CTTCATTTCG GCATCGACAA GCGAAACGCA AGGTTTGACC 2100 

TACTTGGAAA GCTTAGCCAG TGGAACACCT GTCATTGCTC ACGGAAATCC TTATTTGAAC 2160 

AACCTCATCA GTGATAAAAT GTTTGGAACC TTGTACTATG GAGAACATGA TTTGGCTGGT 222 0 

GCTATTTTGG AAGCCCTGAT TGCAACACCA GACATGAACG AGCATACCTT ATCAGAGAAA 2280 

TTGTATGAGA TTTCAGCTGA GAACTTTGGG AAACGAGTGC ATGAGTTTTA TCTGGATGCC 2340 

ATTATTTCAA ATAACTTCCA GAAAGATTTG GCTAAAGATG ATACGGTCAG TCAGCGTATC 2400 

TTTAAGACAG TTTTGTATCT TCAGCAACAG GTGGTTGCTG TACCTGTAAA AGGATCTAGA 2460 

CGCATGTTGA AGGCTTCAAA AACACAGTTG ATCAGTATGA GAGACTATTG GAAAGACCAT 2520 

GAAGAATAGA AAGAGGAACA GCTATGAAAA AAACAATTAA TGAGAAGCGG TCGTGATAAA 2 580 

AAGATTGCGG GTGTTTGTGC TGGGGTGGCC CATTATCTGG ATATGGATCC GACTATCGTT 264 0 

CAAGTCATTT GGGGTGTTCT TACTTGCTGT TACGGAGCTG GAATTGTAGC TTACATTATT 2700 

TTATGGATTA TCGCGA 2716 
(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 13926 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

CTTTGGTTTT GCCTTATTCA AGACATGAGG GCCATCAGGA ATGATCTGAA ACTGCGAATC 60 
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TGTTAACAGT CTATGGAGAG CTTTCATAGA ACTAAGATTC GGTTTATCTT TGCTGCCACA 12 0 

AATTAGTAAG GTTGGATAAG GGTAAGTTCC TGCTATATCC GTTAAATCAA GTGTCTTCAA 180 

CTCCTCAGAA ACTCCGACCA TAAGAGTCTT GTCTGCTCCC TGTTTTTCAA ATACTCTTTT 2 40 

GGGAAGTAGT TTAAAAATCA GCAATTGAAG ATAAAATAGG ATATTCCCTG CTAATTTAAG 3 00 

CGGGCATCCT GACAGAATCA AAGCTCGAAG ATTTGGTAAA TCGTAACTGG AAAGTTCTAG 3 60 

TGTCAGGGCA GCACCTAAGG ACAATCCAAT CAAAACAAAA GGTTCTGTCT CTTGAGCTAG 42 0 

GTGCTGATAA ACTCGCTCTT TAGCTTGTTG ATAGTTACTA ACTCCAGAAG GAAATAACTC 480 

GATAGCCTCA GAAGGATAAT CTGTCAGTAG ATTCCGAACT TCTTTCCAAG ACTCTGCTGA 540 

CTGCCCTAAC CCATGCAAAA ATATTAATTT CATCTAGTTC TCCTCAAGGC TTAATTCATA 600 

CAAGCCTCTC ACTGCATTAC AGCCGTAAAT AGCTTCTGCT TGGGTTAAAT CTGCCAAGGT 660 

CAAGACTTTC TCTTCTACCT GTCCTGTTTC TAGCAAATGC TGACGGTAAA TTCCTGGCAA 72 0 

GATTCCAAGT CGGATAGGCG GTGTGTAGAG TTTTCCAGCG ATTTTCAGAA CCAAATTTCC 780 

TATAGAGGTT TCAAGCAGTT CTCCTGACTT ATTGTGGTAA ATCTTCTCTT GTTCTCCTAG 84 0 

GCTCAAATGC GGTCGGTGAG TGGTTTTAAA GTAGGTAAAG GATTGATTCA AAGCAGCTTC 900 

CTGAAGACAG ACTTGGGCCT GACAAAAGCT TGTACTGAGA GGGGTTAATA CTTGACGATT 960 

GACTTCTATC TCTCCAGATT TGCTAAGGCT GATTCGCAAG CGGTAATCTC GATTAGCTTC 1020 

ACAATCCTGA CACTCTTCCT CAATCTTGTG TCCCAAGTCT TCTGCATCAA AAGGAAAAGC 1080 

AAAATAACGA CTAGCTTTTC TCAGCCTTTC CAGATGTTGT TCTTCAAACA TCAGTTGTTT 1140 

TTGGCTGATT TTTCCAGTTG TAATTAATTG GAAGCGAGCT TGTTTACGAT AGAGAACTGC 1200 

TGCCTTTTGA TGAACCTCTC GGTATTCAGA TTCCCATGTG CTATCCCAAG TAATCCCTCC 1260 

GCCAACTCCA TAAATGGCTT GACCTTTGTG AAGTTGAATG GTACGAATGG CCACATTAAA 1320 

AATCCGTCGT CCATTTGGAA GCAAGAGACC AATCGTTCCA CAGTAGACTC CACGCGGTTG 1380 

AGGCTCCAAG TCCTTGATAA TCTCCATTGT CGCAATTTTC GGTGCACCCG TTATGGAACC 1440 

ACAAGGAAAG AGTGAGCGGA AGATTTCAAC AAGGTCCACA TCCTCTCGCA ACTGACTCTT 1500 

GATGGTCGAA GTCATCTGCC AAACAGTTGA ATACTGCTCT ACCTGACACA GACGCTCCAC 1560 

GTGCTCGCTC CCAACTTCAG AAATACGGTT CATATCATTG CGCAAGAGGT CCACAATCAT 1620 

CATATTTTCA GAGCGATTTT TGGGATCCTG TTCCAACCAA CTGGCCTGTT CAAGATCTTC 16 80 

TTGGTCAGTT ACCCCACGCT GAGTCGTCCC CTTCATTGGT CGTGTTGTCA ACTCGCGATC 1740 

ATTTTGCTCA AAAAAGAGCT CTGGGCTCAT GGAAATCACT GTCATCTCGT CATGTTCCAC 1800 
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ATAGGCATTG TAGCCCGCCT CCTGCTCTAC CACCATACGA TTGTAGATGG CAAAAGGATT 1860 

GGCATTTAAC TTTTGCTTAA GTTGGACGGT GTAGTTGACC TGATAGGTAT CTCCCTGCCG 1920 

TAAATGATGG TGAATTTGGG CAATGGCCTT TTCATAGTCT GCTGCAGACG TTACTTCCTG 1980 

CCAATTTGAG GGCAAATCAA TATCCTCATA AGTCAGAGGA ATAGGGGAAG TTTCTACGAT 2 040 

ATCATGAACA GTAAAGTAAA GCAGGTACTC TCCCAGTAGG GGATCCTTGT GAACTGCTAA 2100 

TTTTTCCTCA AAAGCAGGTG CAGCCTCGTA GCTGACATAC CCCACCACAT AATAACCTTG 2160 

CTCTTGGTAG CTTTCCACTT GTGCCAGCAA ATCTGCCACT TCTTCTACAT TTCTCGTTTT 2220 

CAACTCTTTA ATAGGCTGGG TAAAGGTATA TCTCTCCCCC AAAGTCCTAA AATCAATCAC 2 2 80 

TGTTTTTCTA TGCATACCTT AAGTATAGCA TAAAATAAGA AAACCCTCAT CCGCAAAGCA 2340 

GATGAGAGAT TTCAATTATT TAAAGATTGA AGTTTTAAAG CTATTTGTTT GTTGAAGAAG 2400 

TTTCTTATAA ACAGCTTCTT TTAATTTAAC TGTATTATTC ATAGATACTG TTTTATTACC 2 4 60 

GTTTGCTTCT TGTTTAAGAG TTTCGGCATC TTTTTTAACA GCTTCTTTAA ACAATGTCAG 2 52 0 

TAAATCATCG TATGATGAAA CGGAAGAACC ATTTACTTCG AATGTTGTTA ATCCTTTCGT 2 580 

TGCTTTATCT TTAACTTCTT TGAAGTAAGC TTTTTTAAAT TCTTCAATAG TATTAAATGT 2 640 

ATTGTTAGAT ATTTTCTTGA TAATATATTC ATCACTTAGA ACAGACTCAC CATCTGTTTT 27 00 

AGATTGTTGT TTATATTTAT TTGAAGCATA ACCTAAGAAC CCATTTTCGT ATCCGTAGTA 2 7 60 

ACCCCATAAT CTAAAAGCAT TATGTTTGAA TGAAACAGCT CCAGGAGCAC CTTTACTAGT 2820 

ATTACCTCCG TAGATACCGG TCATCATTCT AACACCTACA TAAGGTGATT GATCGTTATA 2 8 80 

GCTAATTGCT TCGGGTTTAT AGATACCATT ACCTGGATTG CGATTAGTCA TTAATTGTTG 2 940 

ATCAACTAAA TCATTAACAG ATTGAATATT TAATTCATTT TTCTCTTCTT GACTTAGATT 3 000 

TCGAATTTTA TCCCATTGAT TTAATTTATT GTTATCACGG TATTCTCTAT CTATTTTTTT 3 060 

GAACCATGCA CTATTTAAAT CTTTATTTTG TTGAGAAATC ACAGATTCAG CCTCAATTTC 312 0 

ATCAAGAAGA GTTAAAGTGT CATTATAACC CTTCATATAT CTATTAATAT CTTCTCGTGT 3180 

TTTTAGAGTT TTTGGATCTG TAATATACCA CTGATTCCCA TCATTTTTGC GTTTAAATAC 32 4 0 

CATATTAATA CCTAAAGAAC CAAACTCATC AAATCCACTA CCAGTAACAG GAGTTTGTAG 3300 

CATACCCTGA GCATATGCTT CAGCATCAGT ACCTTCACGG TGTCCAAAGC CACCTAAGTA 33 60 

AATCGCACGG TCGTTGACGT GTGTTGTTTC ATGTGTGTAA ACTGAAATAC CGTATTCACC 3420 

AACCATTTCT AAATGAACAT ATTTTACATC AGTTCTAATA TCATCAGAGT TAGGATATAT 34 8 0 

AGCAGCATAA GCTCCTGTTC CATTATAATT ATAATACTTA TCCATAGGAC CAAAGAATTC 3540 

TCTAAGAGGA GTATATACTT TGTCGGTATT ATAGCGGCCA TATTTTTCAA CCCATCCACC 3600 
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AGGAGCGTTA TAACCTTCCC AAATAGGAAT AACAGCATCT CTTAGTAGTC GTTGTTTAAC 3 660 

GTTATCAGAC GCTAGACGAT ACCAGAAATC ATAATAGTTT CTATAACCAT CTGCAGCTTT 3720 

GTTAACGATA TCTTTAATAT CTTCTAATGA TTTTTTACCT AATCGCTCTG CACTACCAAA 3 780 

GGCAATTGCA TTATAATTTG AAATTAAATA AAGATGTGCT TTATCAATAT TCAGTAGTGG 3 840 

GAGTATAGTA TTTCTAAGGT GACTTCGTTT TAAATTATCG AATGCACGAT GTTTAGAATT 3 900 

TTTAATTTCT TCGACCTCAG AAGCGCGTTC TGCGATGTAG ACATGGTCTT CTGTAGCATC 3 960 

AATAAACCAA TCGTTCATAT TGTCTATATT TGTGAACAAT TGTCTATTAT AATTTAAAAA 4 020 

TGCATCTAAA TTACCTGATT TAGTATATTT AGCCAATACT TGACCGAATG CGTCGAATGT 4080 

ACGTGAACCT TTAATGTTGT TCTCTTTAGA ACCGATTTCA ATTAATCTGT CTAATACGCT 4140 

AACTTTTTCA CCATAGAAAT CTGGTTTGAA TAGCATTAAT TCTTTAATAT TAACATCACC 42 00 

AAATTTAACT CCATAGTAAC GATTTAGGTA AGTTAAACCT AGTAATAAAG CTGCTTTGTT 42 60 

TTTCTCGACT TTATCACGAA TCATTTGACG AGCAGCTGGA GAATCATTTA GTTGATGTTC 4 320 

TTCGTTTTGA ACTAATTTTG TGATTAGGTT TGTTAAGTTT TCTTTAACAT CTGTGAAGCT 4 3 80 

TTCTTCTAAA TATAAATCTT TGATTGCATT AACTCTATAG TCACCTAATC GATTTAGATG 4440 

CTGATACATC GTTTGAGACT GAAGCTCTAC TGATTCTAAA ATAGATTTTA TATCATTAAC 4 500 

AAGAGTAGTG TTATCTTTTT GAACGATATT AGGTGTATAT TTAATTCCTA AGTCAGTTAT 4 5 SO 

AGTATATTCT TTTACATTAC TTAAACCTTC ACTGCTAGAA GACAAGTTAA AGTAATCTTT 4 620 

TGTACCGTCC GCATAGTGAA CAATAATTTT ATTAGCTTCA TCTAGGTTTG TGATAAACTC 4 680 

ATTGTTGTTC ATCGCGGTAA CAGAAAGAAC TTCTTTAGTA TTTAGATGGT GTTCTTTATT 474 0 

TAATTTATTA CCTTGATATA CAATATAATC TTTATTGTAG AATGGTATTA ATTTTTCAAG 4 8 00 

ATTTTTATAG GCTTGGTTAT ATTCAGCGTT ATAATCTTGA ATACTAGAAT AGGCTTTTTC 4860 

TTCATTAAGT TTTGCAAGAG GAGATAGATC ACTTTCTAAT TTATCAGCAG TAATATTGAA 4 920 

AGTAGTAACT TTAGCATCAG CTTGTTCTTT AGTTAATTTA GTAAATGTTT TAGATTTCCT 4 980 

AAATGATCTA TTACCTGACG AATATCCCTC TACCGCATAT AAATCTTTTA TATGAGCACT 5 04 0 

AGCATAATCA GAATCATCAA CGTCGTTAGA GCCGAATAAC TCCTCTCCAC GGATAATCTT 5100 

AGCATAGCTG ACAGAATTAC TTACCGTACC TACAGGCCAA GTCTTACTTG CTATTGCTCC 5160 

AACTTCTACT GGATTTGAAA CATCTATTTT ACCTTTTACA ACCGACTCAG TTAGGAGAGC 52 2 0 

TTTTGTACCA ATAAGATGGT CTAGAGTTAA TCCATAATCT ACTTTAGGAA CTAACAAGCT 52 80 

GGCGCGTGTT TTGTTTCCTG TAATAGTAGC ATCAACATAT GCTTTTCTAA CAATTCCTCT 53 4 0 
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ATAGTTTGTA CCTGCAATTC CCCCTGTATG AGAGCCATTT CCACTTGTAG AGTGTAGTTT 54 00 

GCCAAAGAAA GCAACATTTT CAATACGAGT TCCATCATTC ATATTATTTA CAAATCCAGC 5460 

AACATTATTA CGACCTGAAA GTGTGCCTGT AATTTTGACA TTTGTAATAA CTGAAGAACC 552 0 

TTTCATAGTA TTGGCTAATG ATGCAATATT ATCTTGACCA GAACGTTCTA TCTCTACATT 5580 

TTCAAAATTC ACATTATTTA TCGTTGCGTT TGTTATCACA TTAAATAATG GATGTTCCAA 5640 

TTCAGTAATA GCAAATTGTT TTCCTTCAGA ACTTAAAAGT TTTCCTGTGA ATTCTTTAGT 5700 

GATATATGAT TTTCCATTAG GAACAACATT TCTAGCGCTC ATTGATTGTC CCAGACGATA 57 60 

TTCTTTTGAA GGATCGTTTT GAATAGCTTC CACTAATTCT TTGAAATTAT AATATACATT 582 0 

ATCTTCGTGG ACTTTAGGTT TTTCAATATA GTGAACGTAT TCTTCTTCAA ATTTATTATC 5880 

AGCAGTTCTA GAGACTAAAT TGTCTGCGAT TGCTGTAACT TTATATACAG GTGTTCCGTT 5940 

AACCGTAGTT TCTTCTATAT TTTTAACAGC TAGTAATGTA GTTTTCTGAT TATTTGAAGT 6000 

TATTTTTAAA TAATAATTGC TCTTATCATC AGGAATAGTT GTTATCAGTG ATTCATTAGT 60 60 

TTCTTTTCCA TTTTCGTATT TGATTAAATC TGTACGTTTA ATATTTTTAA GCTCAACTTT 612 0 

TTTAAGATCT AATTGAATAT TTTGATTTTC TAGAGTTTCA GTTTCTTCAC CGTTACCTCT 6180 

GTCGTAAATC ATAGTTGTAG ATAGGGTGTA TTCTTTGTAG TACTCTAGGT TCTTAAATGC 624 0 

AGCGCTTATA GTTTCTGTTG TTACCTTGTC ATCTGTAAGG ACTACAGTAT TAATAACTTC 6300 

TTCTCCTTTT TTCAATTCAG CTGTGATTGA TTTGATTTTT GTTTTGTTTT GATTTTCTAG 6360 

AGTATACTTA GCAACAGCTT CACGTTCCAA TATTTTCTTA TCGGTACTAG TCAATGTTAA 6420 

TATTGGCTTT TCAGATAATT CAACCAATTT TTCAATAGTT GCAGTTAATT TTTCAACAGC 6480 

TTCGTTAACT TCACTTTGTT TAGCATCTGT ATTAGCTGCA ACTTTTTCAG CCTTTGTAAC 6540 

TTCAGTTTGG AGGTTTTGCC AACTTCTATC ACTGTAATGT TCTTTTACCT TTGTTTTTGC 5600 

ATCTGCAATC GTATTGTTTA ATTCAGTTTT ATCAACGTTT AGAGCGTCAA TAGCCGTTTT 6660 

AAGTTTATTT GTCTCGCTAT TTACCTCAGG CTGTTTTACA GGCTCTGAAG CATAGACACC 672 0 

TTTTGCAGTT TCTAAAACAG GTCCAAGAGC ATTGTAACTT GCTGTAGAAT AATCAGTAGG 578 0 

AGAAACTGAA CTAGCTTTAT CAATTTGATT ATTTAACTCA CTTTTATCAA CTGGTTCTTT 5840 

AGTACCAATA CCCTTTATTT TATCTTCTGG TTTCGGTGTT TCCTCTACAG CCTTCTCTTC 6900 

TTCAGGAACT TCTGGTTGCT TTTCTGGCTC AACTGGTGCC GTTGGTGCCT GTTCGTCTTC 6960 

TCTTGGCGCG ACTGGTTCAC CTGCTTGTTC AACTTTTGGT TCCTCTGTTG GTTCTGTTTG 702 0 

TTTTTCTACA GCAGGCGTTT CAACTTTTGG TTGTTCAATA GATTGATTAA CAGTCTCCTC 7 08 0 

TTTTGGTTCT ACAGTTTCTT CAGCCTTGGT ATCTGGAGTT GACTCTTCTT GTTTCGGTGT 714 0 
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TTCCTCTACA GCCTTCTCTT 
CTTTTCGTCT TCTCTTGGCG 
TGGTTTGTCT GATGGTTGAC 
AACTTCTCCA CCTACTTCTT 
TACTTTAGGA AGGGTGTCGT 
TTCTTCTGTT TTAGGTGCTT 
TGTCCTAGCT TGCTCCTGAT 
CTCTCCAGGT TTTGCTGAGG 
GTCACCTGAT AGATAACCAA 
AGCCAGCGCT AGGGTCGCAA 
CATAGCTCCA ACTAGAAAGA 
CCCAACAGTC AGCAAACCAA 
CTGATCTTTT TGATACACCA 
AATTAAATCT TTAGCTTCTT 
GTCCACTACA GAAGGAGCCA 
AATCTTACGA ATTGAAAAAC 
AAACTTCCTA ATAGCATCTT 
ACTAGCCAGT GCCGTTACAT 
CTGAACATGC TTGACATGCA 
CTTGATAAAT TCAACCTCAA 
ACCTCCTGCA TGAAGAGTAG 
CTGACGTTCA ACAACGAGAG 
ACCTCTACTA CCTAGAATAT 
GGATTCTTTC CCAATATGAC 
GGCCCCCTGC TTCAAATTGG 
TTGAATCTGT GTCTCGTTCA 
TTTCTTGTTT TTAGCAAGAT 
CAGAATTGAC TGGGAGTTAG 
CTTTTTGGGG TCTAGAATCA 



CTTCAGGAGC TTCTGGTTGC TTTTCTGGCT CGACTGGTGC 72 00 

CGACTGGTTC ACCTGCTTGT TCAACTTTTG ATTCCTCAGC 72 60 

TTTCTGGCTT AACTGCTACT TTTTCCTCTG GTTTTGACTC 7320 

CAACTGGAGC TGGTTCTGCT GAATCTTCTT TCCCCTCTTC 7 3 80 

CAGTAGGTTT TACCTCCGAT TTTGGTTCTT CCTTTGGACT 7440 

CTTCTTTTGG AGCTTCCTCT GTCTCTACTA CTTGGTTTTC 7500 

TTGTTATTGA TTGAGGAGTC TCAACTTCGA CCACAGTCAC 7560 

TTTCTTCTAA AACAGTGTCC AAGCCAAGCG TTTTGAGGAT 7 620 

CATAGCGATA GCCCTCCATT TCAACAACAC CCTCTCGACT 7 680 

CTGGGTCTAC AGCCCCTGCA CTAGGAAGAA CTACCAATCC 7740 

CGCTAGCAAT TTTCTTTCTC TTGTAGATTA AAAGCAAGCT 7 800 

AAGCTGTCAA AACAGATGCT TCTGTCCCTG TTTGAGGCAA 7 8 60 

AACCATATAC AACTTCATTC CTGTCAGGCT TTCCTGTCTG 7 920 

GTGAAATAAT CTCTTTATTT ACATAGTGAT AGGTGGCTGC 7 980 

TCAAAAGGCT TCCAAGAAAT ACAGAGCCTA CAACTCCCTT 8040 

GGTCTTTTTT AAACACTTTT ATCTCCTTTA TTCATTCTCA 8100 

GCGGATAGTG CGCACGCGCA CCTCCGATTA ATTTTGGACG 8160 

GGGCATGACC AATCTCTCTC AAAATAGGGC GAATCGGAAC 8220 

TGCCAATTGC AGTGTCTCCG ATATCCAATC CAGCATGAGC 8280 

CTGCATCCTG CATAAACTTA AAGGCTGCCA ACTGCCCCGA 8340 

GATGGACACT GACAATTTCC AGACCAAACT GCTCTGCCAC 8400 

CCCGATTGAC ATGCTCACAA CCTTGAACTG CTAAATGGAT 84 60 

CCAAGATAGT CTCCACTATC AGCTCACCAA TCTCTTGACT 8520 

CACCTAGCAC CTCACTAGAA GAT AG ACC TA AAACAAAAAG 8580 

TCTTTTCTAA AACATCTTCC ACTACCTGAC GTGTTTCTCT 8640 

TCTCTGTTAC CTCTGTTGTC ACTCTTCTAT CATACCGTTT 8700 

AGACAACCTA GAAAGTTTGC CCAATTACGC ATAAAACTCC 87 60 

CTAGTTTCTA TTCTATTTAT ATATATTTCA ACTTTCGTCC 8820 

ATCTTCATAT GGTAATTGGC TCCAAAATGA AGTTTGAGCC 8880 
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GTTGATCGAC ATT T T G AAG A CCAACTCCCC CACGTTTGAG TTGACTTTGA CTACTATCAC 894 0 

CAGCATCTTG GAAGCCAACG CCATCATCCT CAATACGGAT GACCAATCCC GAATCCTGTT 9000 

TCTGGACAGA AAGTTTAATA TGGCCCTGAC CTTCCTTTTC CTTAATGCCA TGGTAAAGAG 9060 

CATTTTCTAC AAGGGGTTGT AGGACCAGCT TGGGTAAGAC TAAATTATCA AAGGCAACAT 9120 

TTTCATTAAT TTCGTATTCC AGCTTATCTC CATAGCGTTG TTTCTGGATA AAGAGATACT 9180 

GGCGGACATG ATTGATTTCG TCAGAGAGAC AAATCAAGTC CTTGCCTTGA TTGAGCGCCA 9240 

AGCGGAAATA GGTTGCCAAG GACTTGGTCA CCTGCACCAC TCGCTGACTA TCATGAAATT 9300 

CAGCCATCCA GATGATGGTG T C C AAAGTGT TATAGAGGAA ATGTGGATTA ATCTGGCTCG 9360 

AAAGGGCTTG AAGTTGGTAC TGACGGGTCG TTTCTTCCTG GCTACGAATA GCTACCATCA 9420 

ACTGATCAAT CTGATCCAAC ATAGCATTAA ATTGGCGAGT TACTTCTCTC AGTTCATAGG 9480 

CACCAACTTC CTTGGCACGA AGATTTTGAG CACCAGAAGC AATTTCCAAC ATGGTTTCTC 9 540 

TCAAATCCTT CAAAGGAGCA ATCCAGCGTT TAAGACTGAA CCACACTAAG CAGAGACAGA 9 600 

CAAGAAGAGA TGTGACACTG GCCCCAAGCA AGGTCCACAA GAGCTGACTC CGAACCTGGT 9660 

CTAACTTTTC CAATGATGAC ACGCCAAGCA CCGTCCAATC AGTTCCTGCA ATCTTCTCTT 9720 

GACTGACGTA GGATTTGTGA CCAGGAGTAT AACCCTGACC TGTATCGATG TAGGGTTTCA 97 80 

TAGCCTCCAT TTTGCTAGAC GAACTATAAA CTGTGTGTTG AGGATGGTAG ACAAATTCAT 9840 

GGTTTTCATT GATAATGAAG GCAAAGCCCT GCTGCCCCAA CTGGAGTTGA TTGAGATAGG 9900 

CTTCCAGAGT TTCATAAGAA ATATCCAAAC GAAGCACACC AAGATTGGCT CCCTTTGCAT 9960 

CAACAAGTTC TTGAGTGACA GAAATGACCC ACTGACTATC TGATTTACGA GCTGGAGTCA 10020 

AAACAGGCAT AGCTCCCTGA TGAATGGCCT TTTGGTACCA ATCCTCAGCC ATCATATCAG 10080 

AGGAAGTTTT CATCTGCACA CTGTCATCTG TAGAAATGAC CTGACCAGAT TTGGTCACCA 10140 

GCACAACAGT TTTCAAGTCC TTATCTGACT TCAAGATGGT CAAAAACAAA TCTCGGATTC 10200 

CCTCGACCTT GTCTTGACTG GGATTCTCAG CATAGGCCAG AACATCCGTC TGCTGGGTCA 102 60 

AACCAGTCGA GGTGGTTTCT AGTTTTTTGA TATAAGACTG AATAAAGTGG CTAGTCTGGC 1032 0 

TGATGGTCGT TTGGCTGTTG CCCTCAATGG TGGCCTCAAT GGCTGAAGAA CTTGATTGAT 103 80 

AGTAGAAAGT TCCAACCAGA GCTAGGAGAA TGAGAAAGAC CAGAAAGATG GAAATAACCA 1044 0 

TTCTAACTAA AAGAGAAGAA CGCTTCATCG GTCTTCTCCC TTCTTAAACT GACGAGGTGT 10500 

CACACCTGCA ATCTGCTTAA AACGTTGGGT AAAATAGTTC ATATCTTCAA AACCAACCTT 105 60 

CTCTGCGATC TCATAAATCT TCAGATCTGT AGTTAAAAGC AAGAGCTTGG CTTGTTTAAC 10620 

ACGTTCTCTC ACCAGATAAT CCTGAAAAGG CAAGCCCAAC TCTTTCTTAA TCAAGGAACT 10 680 
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CAGATAGGTC GGACTAAAAC CTAAGTCACT GGCTAAAGAC TTTAAACTAA ATTGGCTATC 10740 

AGCCAGATGA GACTGGATTT TCTGGGCCAT GTTTCCTTCA AACCTATTAG TCAATAAATC 10800 

TTGTAACTGC TCTTCTTTCT CTTCCTTGTC TAGTTTTTGT TTGATTTTCC CCAACATTTC 108 60 

CTCAATATCC TGACGAGAAA AGGGTTTGAG CAGGTAGTCG TCCACACCTA GTTTGACAGC 1092 0 

AGACAAGGCA TAATCAAAAT CATCGTAACC TGTTAAAAAG ACCAAATGAA CCTGAGGATA 109 8 0 

GGTTTCTCGT ACCAGACTGG CCAACTGGAT GCCATTTAGA TGAGGCATCT TGATATCGGT 11040 

TAAAATGATA TCTGGCACCT GCTTTTGGAT CAATTCCCAA GCCTGCCTTC CATTTTCAGC 11100 

CTGACCGATG ATTTCCATAT CGTAGGCTGC TACATTGACC AGTTTAGTCA AACCTTGTCT 11160 

TACCAGATAT TCATCTTCTA CGATTAAGAT TGTGTAGGTC ATGCTCTGCT CCTTTACCAC 1122 0 

TTACTAGTAT CAGTATAGCA AAATTCTCCT CTAACTGCTT AGGAAAGACC TCTTATACTC 112 8 0 

AATAAAAATC AAAAAGTAAA CTAGGAAGAT AGCCACAGGT TTCTCAAAGT ACCGCTTTGA 113 4 0 

GGTTGTAAAT AAAACTGACG AAGTCGACTC AAAGTATAGC TTTGAGGTTG TAGATAAAAC 114 00 

TGACGAAGTC GATAACCCTA CATACGGTAA GGCGACGCTG ACGTGGTTTG AAGAGATTTT 114 60 

CGAAGAGTAT TAATCAACAT AATCTAGTAA ATAAGCGTAc CTTTTTCTTC CATTTGGTCT 11520 

TTGGGAATAA AGCGGATAGA GAGGCTATTG ATACAGTAAC GTAAGCCGCC CTTGTCCTGT 115 8 0 

GGACCATCCG TAAAGACATG CCCAAGGTGA GAATCTCCTA CTCGGCTCCG CACTTCCATA 1164 0 

CGCGTCATAT TGTAGGACTT ATCTTCCTTG TAGGTGACAA CATCTGGACT GATGGGTTGG 117 00 

GTAAAACTAG GCCAGCCACA ACCAGACTCA AATTTGTCTT TTGATGAAAA GAGAGGTTCC 11760 

CCAGTTGCTA TATCCACATA GATACCGGAT TCAAATTTAT CCCAGTAACG GTTTGAGAAA 11820 

GCTCGTTCTG TTTGATTTTC CTGGGTAACT GCATACTCCT CAGGTGACAG GGTCTTTTTC 1188 0 

AATTCCTCAT CACTTGGTTT TGGATATTTG CTGGCATCAA TGACAGGATA GGCCGCCTGA 11940 

TTAACATTGA TATGGCAGTA GCCATTTGGA TTTTTCTTGA GATAGTCTTG ATGGTAATCC 12000 

TCAGCCACCA CAAAATTCTT CAAGTTTTCC TTTTCAACTG CTAGAGGTTG ATCGTATTTC 12 060 

TTAGCCACCT CATCAAAGAC TTGGTTAATC ACTTCCAAAT CCTTGTCATC TGTGTAATAA 1212 0 

ACACCAGTAC GGTACTGGGT CCCCACATCA TTTCCTTGTT TATTTTTGCT GGTTGGATTG 12180 

ATAATGCGGA AATAGTGAAG CAGGATTTCC TTGAGAGAAA TTTGCTTGGC ATCATAGGTG 12 240 

ACATGGACGG TTTCTGCATG ACCTGTTTGG TTAATCAATT CGTACTTGGT TGTTTCTCCT 12 300 

CTACCATTTG CATAGCCTGA AACGGCATCC GTCACCCCGG GAACACGTGA GAAATATTCC 12 360 

TCCACTCCCC AGAAACAACC TCCAGCTAGA TAAATTTCGT GCAAGTCTGC GTCTTTACTA 12420 
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ATTTCTGTTT TTTTCACTGC TTTTCCTCCT TGGCTAACTG CCGCCTTTTC AATTTGCGAG 12480 

GCATCTGTCT GCCCTGCATT TCGTATCAAT AGAACATAGA AACCGGTTAT GGCTAGAAAA 12540 

AATACTCCTA GCAACAAGAA GATTTTTAAC TTATCATTCA TAAGACGCCT CCTAGGCTAA 12 600 

TTCCTTCAAA GTTTGCAAAA TTGCATCTTT TTCCATGAAT CCTGGATGTG TTTTGACCAG 12660 

CTTGCCTTCT TTGTCTATAA AGGCTTGGGT TGGGTAAGAA CGGACACCAT AAGTTTCCAA 1272 0 

AAGTTTGCCT GATGGGTCAA CTAGGACTGG GAGATTTTTA TAATCCAATC CCTTATACCA 12780 

ATTCTTAAAG TCCGCTTCAG ATTGCTCTCC CTTATGTCCT GGTGACACTA CTGTCAAGAC 12 840 

CACATAGTCA TCACCAGCTT CTTTAGCAAT CTCATCCGTA TCTGGAAGAC TAGCCAGACA 12 900 

GATGGAACAC CAAGAAGCCC AGAATTTGAG ATAGACTTTC TTGCCCTTGT AAT C AG AT AA 12960 

ACGGTAGGTC TTGCCATCTA CTCCCATCAA TTCAAAATCA GCCACCTCTT TCCCTTTAGC 1302 0 

TGCGCTTGTT TTACTAGCTG TCTGCTCCGT CTTCATTTCA TCTTTCGTTT GGTGTTCACT 13080 

AGTCACGGAC TTGCCTGAAC AAGCCGTCAA ACAAAGGAGC GAACCTGCTC CAAGAACACA 1314 0 

TGTTTGCCAT TTTTTCATAT TGATATTCCT TTCCATTTTA TTCAAATAAT TGACTTAAAA 132 00 

TTGAAGCATT TCCA^ACAGA ACCAAGAAGC CCATCACAAT AATGAGAAAA CCACCCACTT 132 60 

TTTTGAGGAT TCCGAGATAG GGATGAAGTT TTCGGAAATG TTTCAAAACA TAACTAGAGG 1332 0 

TCAGAGCTAG AAGCAAGAAT GGTAGCGCCA AGCCCAGCGT ATACACCAAC ATGAGACCAG 133 80 

CTCCCTGCCA AGCTCCTGAA CCACCTGAAG CCGCCAAGGC CAAAACAGAC CCCAGAACCG 13440 

GCCCCACGCA AGGCGTCCAA GCAAAACTAA AGGTCAAGCC CAATAAAAAT GCCTGACTAT 135 00 

AGCCCTTACC ATTTTGCCCC TGTCCTTGCA GTTGTAGCCT CTTTTCCTTA TAAAGCCCCT 13560 

TAAAGTGTAG AATCTCCATT TGGTGCAAAC CAAGAAGGAT AATAATTGCC CCAGTAAGAT 13620 

ATTGGAACCA AGAAGCATAA AGCAAATCGC CTAAAAAACC AGCTCCATAG CCCAACAAAA 13 6 80 

TAAATATAAA GGAAATTCCT GCTATAAAGG CCAGAGTTCG TAATAAACTA GTAACTGAGA 13740 

TTGAAAATTT GCCGCTAGAA GCCTGAGCAC CATCCTTATC ATCTAGTAAC ACTCCTGTAT 13800 

AGACCGGTAA CAAAGGTAAG ATACAAGGAG AAAAGAAGGA TAGAATCCCT GCCAAAAAGA 138 60 

CACTTAGAAA AAAGAAAATA TGACCCATAA AGTTCCTCCT ATCATTTTAT TGATAGATTT 13 92 0 

ATT AT A 1392 6 
(2) INFORMATION FOR SEQ ID NO : 6: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 20199 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY : linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

CCCAGCAGAA AAATGGCATT TGGAGATAAT GGAAATCGTA AAAAAACTAT GTTTGAGAAA 60 

ATAACCTTGT TTATCGTGAT TATCATGCTA GTAGCAAGTT TATTGGGAAT TTTTGCAACT 12 0 

GCAATTGGTG CCCTCAGTAA TCTATAAAAT AGATTCAAGA AAATTTAGTG ACTGGGATTT 18 0 

CCCAGCCCTT TTTTAAAGTG AGAAGAAATA ATGAGTATGT TTTTAGATAC AGCTAAGATT 240 

AAGGTCAAGG CTGGTAATGG TGGCGATGGT ATGGTTGCCT TTCGTCGTGA AAAATATGTC 300 

CCTAATGGAG GCCCTTGGGG TGGTGATGGT GGTCGTGGAG GCAATGTGGT CTTCGTTGTA 360 

GACGAAGGAC TACGTACCTT GATGGATTTC CGCTACAATC GTCATTTCAA GGCTGATTCT 420 

GGTGAAAAAG GGATGACCAA AGGGATGCAT GGTCGTGGTG CTGAGGACCT TAGAGTTCGA 480 

GTACCACAAG GTACGACTGT TCGTGATGCG GAGACTGGCA AGGTTTTAAC AGATTTGATT 540 

GAACATGGGC AAGAATTTAT CGTTGCCCAC GGTGGTCGTG GTGGACGTGG AAATATTCGT 500 

TTCGCGACAC CAAAAAATCC TGCACCGGAA ATCTCTGAAA ATGGAGAACC AGGTCAGGAA 660 

CGTGAGTTAC AATTGGAACT AAAAATCTTG GCAGATGTCG GTTTAGTAGG ATTCCCATCT 720 

GTAGGGAAGT CAACACTTTT AAGTGTTATT ACCTCAGCTA AGCCTAAAAT TGGTGCCTAC 780 

CACTTTACCA CTATTGTACC AAATTTAGGT ATGGTTCGCA CCCAATCAGG TGAATCCTTT 840 

GCAGTAGCCG ACTTGCCAGG TTTGATTGAA GGGGCTAGTC AAGGTGTTGG TTTGGGAACT 900 

CAGTTCCTCC GTCACATCGA GCGTACACGT GTTATCCTTC ACATCATTGA TATGTCAGCT 960 

AGCGAGGGCC GTGATCCATA TGAGGACTAC CTAGCTATCA ATAAAGAGCT GGAGTCTTAC 1020 

AATCTTCGCC TCATGGAGCG TCCACAGATT ATTGTAGCTA ATAAGATGGA CATGCCTGAG 1080 

AGTCAGGAAA ATCTTGAAGA CTTTAAGAAA AAATTGGCTG AAAATTATGA TGAATTTGAA 1140 

GAGTTACCAG CTATCTTCCC AATTTCTGGA TTGACCAAGC AAGGTCTGGC AACACTTTTA 12 00 

GATGCTACAG CTGAATTGTT AGACAAGACA CCAGAATTTT TGCTCTACGA CGAGTCCGAT 12 60 

ATGGAAGAAG AAGCTTACTA TGGATTTGAC GAAGAAGAAA AAGCCTTTGA AATTAGTCGT 13 20 

GATGACGATG CGACATGGGT ACTTTCTGGT GAAAAACTCA TGAAACTCTT TAATATGACC 1380 

AACTTTGATC GTGATGAATC TGTCATGAAA TTTGCCCGTC AGCTTCGTGG TATGGGGGTT 1440 

GATGAAGCCC TTCGTGCGCG TGGAGCTAAA GATGGGGATT TGGTCCGCAT TGGTAAATTT 1500 

GAGTTTGAAT TTGTAGACTA GGAGACTGGT ATGGGAGATA AACCGATATC TTTCCGAGAT 15 60 

GCGGATGGTA ATTTTGTTTC CGCCGCAGAC GTTTGGAATG AAAAGAAATT GGAAGAACTA 1620 
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TTTAATCGTC TCAATCCAAA TCGTGCCTTG AGATTGGCAC GAACTAAAAA GGAAAATCCA 1680 

TCTCAGTAAA GAAGCTAAAA AATCCCGTGC CTCATCAGAC ACGGGATTTT GTGGTACGAC 1740 

AGGCATGTAT AGCAAACTGA ATCTGGAATA GCACAGCATA TCTTCTAAAA TATAGTAAAA 1800 

TGAAATGAGA ACAGGACAAA TCGATCAGGA CAGTAAAATC GATTTCTAAC AATGTTTTAT 1860 

AAG C AG AG AT GTACTATTCT AGTTTCAATC AACTATATTG TTATAAATTG ATTTGAATTT 1920 

CAAAATTAAA TTGTTTGATT CTTATTTCAA TTTGTTATAG TATATCTGAT GTCAAAGTTC 198 0 

TCGGCGAGTC AAATAGCGAT TCCCAAGCCT GACTATCGTG AGGTAGCGGA TTAAAATGGT 2040 

CTGGGGATAG ACCGTTTTAA GTCTGACGCT GGAAATAAGA ATTGTCAGAA GAAGGGATAG 2100 

CGAAATCGTG GCTCTACGAA CAGGAACGTG ATAATAAGGC GTATATAGCG GATAAGAGGG 2160 

CATCAAACTC TAAAGTCCAA AAAGGTAGTC GTAACCTATA TGCGTAAATC ACGAGAGTAA 2 22 0 

TTGAATTCGT ACTAAGATTT TCTATTTTCA CTGTAACCTT TTAACGCCCT TATATCTTGT 22 80 

ATACACGAGG AAAGATGTAC GACTTATCCC GTGAGGTCTA TCACTATAAA GAGAAAACGA 234 0 

CAGATAGAAG TGATCCTGAG TCACGGTTAT CTGTCTGATA GGACGGTATG TATAAAACGC 2400 

TTCTGTGAAC TGAGAGAAGG GGGAGAAGTT CTTGCTAAAA TTTAGTTGAA CAGCCGTATT 24 60 

CCGATACTTA GATAAGAGAT CTAGTCTTAG CTCCTACTCA GTTTTAGGGG ATAAAAAAGG 2 52 0 

GGCAATAGCG ATTCGAGAAA GATTATACTC TTCGAAAATC TCTTCAAATC ACGTCAATAT 25 80 

CGCCTTGTCG TATGTGTAGG ATACTGACTA CGTCAGTTCC ATCTACAACC TCAAAACAGT 264 0 

GTTTTGAGCA ACcTGCGGCT AGTTTCCTAG TTTGATCTTT GATTTTCATT GAGTATTAGT 27 00 

AATTCAGTTA CTAACTCGTC AACTCTGATT TATCCAATAA AATTGAAAAG GATGGAAAAA 27 60 

AGGATAAATT TATGATATAC TTTATTTTGA AGACCTTATT AGAAATCTTG AAAGAGTATT 282 0 

GAAAACTTAG AATGAGAAAA ATTGTTATCA ATGGTGGATT ACCACTGCAA GGTGAAATCA 288 0 

CTATTAGTGG TGCTAAAAAT AGTGTCGTTG CCTTAATTCC AGCTATTATC TTGGCTGATG 2940 

ATGTGGTGAC TTTGGATTGC GTTCCAGATA TTTCGGATGT AGCCAGTCTT GTCGAAATCA 3000 

TGGAATTGAT GGGAGCTACT GTTAAGCGTT ATGACGATGT ATTGGAGATT GACCCAAGAG 30 60 

GTGTTCAAAA TATTCCAATG CCTTATGGTA AAATTAACAG TCTTCGTGCA TCTTACTATT 312 0 

TTTATGGGAG CCTCTTAGGC CGTTTTGGTG AAGCGACAGT TGGTCTACCG GGAGGATGTG 3180 

ATCTTGGTCC TCGTCCGATT GACTTACACC TTAAGGCGTT TGAAGCTATG GGTGCCACTG 3240 

CTAGCTACGA GGGAGATAAC ATGAAGTTAT CTGCTAAAGA TACAGGACTT CATGGTGCAA 3 30 0 

GTATTTACAT GGATACGGTT AGTGTGGGAG CAACGATTAA TACGATGATT GCTGCGGTTA 3360 

AAGCAAATGG TCGTACTATT ATTGAAAATG CAGCCCGTGA ACCTGAGATT ATTGATGTAG 342 0 
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CTACTCTCTT GAATAATATG GGTGCCCATA TCCGTGGGGC AGGAACTAAT ATCATCATTA 3480 

TTGATGGTGT TGAAAGATTA CATGGGACAC GTCATCAGGT GATTCCAGAC CGCATTGAAG 3540 

CTGGAACATA TATATCTTTA GCTGCTGCAG TTGGTAAAGG AATTCGTATA AATAATGTTC 3 600 

TTTACGAACA CCTGGAAGGG TTTATTGCTA AGTTGGAAGA AATGGGAGTG AGAATGACTG 36 60 

TATCTGAAGA CAGCATTTTT GTCGAGGAAC AGTCTAATTT GAAAGCAATC AATATTAAGA 37 2 0 

CAGCTCCTTA CCCAGGCTTT GCAACTGATT TGCAACAACC GCTTACCCCT CTTTTACTAA 37 80 

GAGCGAATGG TCGTGGTACA ATTGTCGATA CGATTTACGA AAAACGTGTA AATCATGTTT 3 84 0 

TTGAACTAGC AAAGATGGAT GCGGATATTT CGACAACAAA TGGTCATATT TTGTACACGG 3900 

GTGGACGTGA TTTACGTGGG GCCAGTGTTA AAGCGACCGA CTTAAGAGCT GGGGCTGCAC 3960 

TAGTCATTGC TGGGCTTATG GCTGAAGGTA AAACTGAAAT TACCAATATC GAGTTTATCT 402 0 

TACGTGGTTA TTCTGATATT ATCGAAAAAT TACGTAATTT AGGAGCGGAT ATTAGACTTG 408 0 

TTGAGGATTA AACCGTAGAG GTGTTTATGA ATATTTGGAC CAAATTAGCA ATGTTTTCTT 414 0 

TTTTTGAAAC GGATCGCTTG TATTTGCGTC CTTTCTTTTT TAGTGATAGT CAGGACTTCC 4200 

GCGAGATAGC TTCAAATCCA GAAAATCTTC AATTTATTTT CCCAACGCAG GCAAGTCTGG 42 60 

AAGAAAGTCA ATATGCACTG GCCAATTACT TTATGAAGTC CCCTTTGGGA GTGTGGGCAA 4320 

TTTGTGACCA GAAAAATCAA CAAATGATTG GTTCTATTAA ATTTGAGAAG TTAGATGAAA 4380 

TCAAAAAAGA AGCTGAGCTT GGCTATTTTT TGAGAAAAGA TGCTTGGTCG CAAGGATTTA 4440 

TGACAGAGGT TGTTAGAAAA ATTTGTCAGC TTTCTTTTGA GGAATTTGGC TTAAAACAAT 4500 

TATTTATCAT TACCCACCTT GAAAATAAAG CTAGCCAAAG AGTTGCTCTT AAGTCTGGAT 4560 

TTAGTTTGTT CCGTCAGTTT AAGGGAAGTG ATCGTTACAC AAGAAAAATG CGGGATTATC 4 62 0 

TTGAATTTCG GTATGTAAAA GGAGAGTTCA ATGAGTAAGC ATCAGGAAAT TCTAAGCTAT 4 68 0 

TTGGAGGAAT TACCAGTAGG TAAAAGGGTC AGTGTTCGTA GCATTTCGAA TCATCTAGGA 4740 

GTTAGTGATG GAACAGCCTA TCGGGCTATT AAAGAAGCTG AAAACCGTGG AATTGTGGAG 4800 

ACCCGTCCTA GAAGTGGAAC AATTCGTGTT AAATCCCAGA AAGTTGCTAT AGAGAGATTA 4 86 0 

ACGTTTGCTG AAATTGCAGA AGTGACTTCT TCTGAGGTTC TGGCTGGGCA AGAAGGTTTA 4 920 

GAGAGAGAAT TTAGTAAGTT TTCAATTGGT GCCATGACTG AACAAAATAT CTTGTCTTAC 4 980 

CTTCATGATG GGGGGCTCTT GATTGTCGGA GACCGAACCC GTATTCAGTT GCTAGCCTTG 5040 

GAAAATGAAA ATGCAGTTCT GGTTACAGGG GGATTTCAGG TTCATGATGA TGTGCTTAAA 5100 

CTGGCCAATC AAAAAGGGAT TCCTGTTCTA AGAAGTAAGC ATGATACCTT TACCGTCGCG 5160 
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ACCATGATCA ATAAAGCCTT GTCAAATGTC CAAATCAAGA CTGATATTCT GACAGTTGAG 522 0 

AAACTTTATC GCCCTAGTCA TGAGTATGGT TTTCTGAGAG AGACAGATAC AGTTAAAGAT 52 80 

TATTTGGACT TGGTTCGTAA GAATCGTAGC AGCCGTTTCC CTGTTATCAA TCAACATCAG 5340 

GTCGTTGTTG GTGTTGTAAC CATGAGAGAC GCTGGTGATA AATCACCAAG CACGACAATT 5400 

GATAAGGTTA TGTCTCGTAG TCTATTTTTG GTTGGATTAT CGACAAATAT TGCCAATGTG 54 60 

AGTCAACGGA TGATCGCAGA AGACTTTGAA ATGGTACCAG TTGTTCGAAG CAATCAAACT 5520 

TTGCTTGGCG TTGTGACGCG ACGAGATGTC ATGGAGAAGA TGAGCCGTTC CCAAGTTTCG 558 0 

GCTCTACCAA CTTTTTCTGA GCAGATTGGA CAAAAGCTCT CTTATCACCA TGATGAAGTA 564 0 

GTCATTACAG TGGAACCCTT TATGCTAGAA AAAAATGGAG TTTTGGCTAA TGGTGTATTG 57 00 

GCAGAAATTC TGACCCACAT GACCCGATTT AGTTGTTAAT AGTGGTCGCA ATCTCATTAT 57 60 

CGAGCAGATG CTGATCTACT TTTTGCAGGC TGTTCAGATA GATGATATAT TGCGCATTCA 582 0 

GGCACGGATT ATTCATCATA CGAGACGGTC AGCTATAATT GATTACGATA TTTATCATGG 58 8 0 

TCACCAGATT GTTTCAAAAG CAAATGTGAC TGTTAAAATT AATTAGAAAC TAGGAGAAAA 594 0 

GATGATAACA TTAAAATCAG CTCGTGAAAT CGAAGCTATG GACAAGGCTG GTGATTTTCT 6000 

AGCAAGTATT CATATAGGCT TACGTGATTT GATTAAGCCA GGCGTAGATA TGTGGGAAGT 6060 

TGAAGAATAT GTCCGCCGTC GTTGTAAAGA AGAAAATTTC CTTCCACTTC AGATTGGGGT 612 0 

TGACGGTGCC ATGATGGACT ATCCTTATGC TACCTGTTGC TCTCTTAACG ATGAAGTGGC 6180 

TCACGCTTTC CCTCGTCATT ATATCTTGAA AGATGGTGAT TTGCTCAAAG TTGATATGGT 624 0 

TTTGGGAGGT CCCATTGCTA AATCTGACCT AAATGTCTCA AAATTAAACT TCAACAATGT 63 00 

TGAACAAATG AAAAAATACA CTCAGAGCTA TTCTGGTGGT TTAGCAGACT CATGTTGGGC 63 6 0 

TTATGCTGTT GGTACACCGT CCGAAGAAGT CAAAAACTTG ATGGATGTAA CCAAAGAAGC 642 0 

TATGTACAAG GGTATTGAGC AAGCTGTTGT TGGAAATCGT ATCGGTGATA TCGGTGCGGC 648 0 

TATTCAAGAA TACGCTGAAA GTCGTGGTTA CGGTGTAGTG CGTGATTTGG TTGGTCATGG 654 0 

TGTTGGCCCA ACTATGCACG AAGAACCAAT GGTTCCTAAC TATGGTATTG CAGGTCGTGG 6600 

ACTCCGTCTT CGTGAAGGAA TGGTCTTAAC CATTGAACCA ATGATCAATA CAGGCGATTG 6660 

GGAAATTGAT ACAGATATGA AAACTGGTTG GGCGCATAAG ACCATTGACG GTGGATTGTC 672 0 

ATGTCAGTAT GAACACCAAT TTGTCATTAC GAAAGATGGA CCTGTTATCT TGACTAGCCA 6780 

AGGTGAAGAA GGAACTTATT AATAAAAAGT GAAAAGACTA CTGGAAGTTT ATTTTGATAA 684 0 

AAAATCCAGT AGATCTTTTC ATAATAAAAC GCATTGTATC AAGTGTTAGG GGCTGATATC 6900 

ATGCGTTTTT CTGCTTTTAA GATTTTTTCC AACTCTGTTT GTAAGCGCAT CATAACAAAG 6960 
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GGTCTAGGAT TCAGGGCTCT CCTCCTATAT ACTATTAGTA AAGTAAAACT AAGGGAGGAT 702 0 

ATTTTAGTGT CGCAGTCTAT TGTTCCTGTA GAGATTCCAC AATATTGTCG TTTTGATTCT 7080 

AAAAAGAGAA ATGGAATTCT GTTTAATGTT CGTATTGCCA ATCTTAAATT TACTTTTTTA 714 0 

TATTATACTT CCTGCGAAAC AAAATATGGT ATAGTAGTTC TATGAATGAT GAAGCAAGTA 7200 

AACAACTAAC TGATGCACGA TTTAAGCGTC TTGTTGGTGT TCAGCGTACC ACTTTTGAAG 72 60 

AGATGTTAGC TGTATTAAAA ACAGCTTATC AACTTAAACA CGCAAAAGGT GGACGAAAAC 732 0 

CTAAATTAAG CCTAGAAGAC CTTCTTATGC CCACTCTTCA ATAGTGCGAG AATATCGAAC 73 80 

TTATGAAGAA ATTGCGGCTG ATTTTGGTAT TCACGAAAGC AACTTTATCC GTCGGAGCCA 7440 

ATGGGTTGAA ATAACTCTTG TTCAAAGTGG TTTTACGGTT TCAAGAACTC CTCTCAGTTC 7500 

TGAGGACACG GTAATGATTG ATGCGACGGA AGTAAAAATC AATCGCCCTA AAAAAACAAT 7560 

TAGCGAATGA TTCTGGTAAA AAGAAATTTC ACGCTATGAA GGCTCAAGCG ATTGTCACAA 7 62 0 

GTCAAGGGAG AATTGTTTCT TTGGATATCG CTGTGAACTA TAGTCATGAT ATGAAGTTGT 7 68 0 

TCAAAATGAG TCGTAGAAAT ATCGAACAAG CTGGTAAAAT CTTGGCTGAC AGTGGTTATC 774 0 

AAGGGCTCAT GAAGATATAT CCTCAAGCAC AAACTCCACG TAAATCCAGC AAACTCAAGC 7 800 

CGCTAACAGC TGAAGATAAA GCCTATAACC ATGCGCTATC TAAGGAAAGA AGCAAGGTTG 7860 

AGAACATCTT TGCCAAAGTA AAAACGTTTA AAATATTTTC AACAACCTAT CGAAATCATC 7920 

GTAAACGCTT CGGATTACGA ATGAATTTGA GTGCTGGTAT TATCAATCAT GAACTAGGAT 7 980 

TCTAGTTTTG CAGGAAGTCT ATTGAGGTAT TGAGCTAGTT TATGAAAAAA TTGGGTGAAA 8040 

AGTCGAGTGT TTTAGAAACC CACAGTGTAG TATTCTAGTT TCAATCCACT ATATTTTGCT 8100 

ACTCCCCGTA AAGTTTCTAT TTTCCCTGAT TTCTGATATA ATAGAAATAT TGACTTCAAG 8160 

AGTAAGGAAG AGAAGATGAA CGCATTATTA AATGGAATGA ATGACCGTCA GGCTGAGGCG 8220 

GTGCAAACGA CAGAAGGTCC CTTGCTAATC ATGGCAGGGG CTGGTTCTGG AAAGACTCGT 8280 

GTTTTGACCC ACCGTATCGC TTATTTGATT GATGAAAAGC TGGTCAATCC TTGGAATATC 8 340 

TTGGCCATTA CCTTTACCAA CAAGGCTGCG CGTGAGATGA AAGAGCGTGC TTATAGCCTC 8400 

AATCCAGCGA CTCAGGACTG TCTGATTGCG ACCTTCCACT CCATGTGTGT GCGTATTTTG 8460 

CGTCGCGATG CGGACCATAT TGGCTACAAT CGTAATTTTA CAATTGTGGA TCCTGGTGAA 8520 

CAGCGAACGC TCATGAAACG TATTCTCAAA CAGTTGAACT TGGACCCTAA AAAATGGAAT 8 580 

GAACGAACTA TTTTGGGGAC CATTTCCAAT GCTAAGAATG ATTTGATTGA TGATGTTGCT 8640 

TATGCTGCCC AAGCTGGCGA TATGTATACG CAAATTGTGG CCCAGTGTTA TACAGCCTAT 8700 
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CAAAAAGAAC TTCGTCAGTC TGAATCCGTT GACTTTGATG ATTTGATTAT GCTGACCTTG 87 60 

CGTCTCTTTG ATCAAAATCC TGATGTTTTG ACCTACTACC AGCAAAAATT CCAATACATC 882 0 

CACGTTGATG AGTACCAAGA TACCAACCAC GCTCAGTACC AATTGGTCAA ACTCTTGGCT 88 80 

TCCCGTTTTA AAAATATCTG TGTGGTTGGG GATGCGGACC AGTCTATCTA CGGTTGGCGT 8940 

GGTGCTGATA TGCAGAATAT CTTGGACTTT GAAAAGGATT ACCCCAAAGC CAAGGTTGTT 9000 

TTGTTGGAGG AAAATTACCG CTCAACCAAA ACCATTCTCC AAGCGGCCAA CGAGGTTATT 9060 

AAAAATAATA AAAATCGCCG TCCTAAAAAT CTCTGGACTC AAAACGCTGA TGGGGAGCAA 912 0 

ATCGTTTACT ATCGTGCCGA TGATGAGCTG GATGAGGCTG TATTTGTAGC CAGAACCATC 9180 

GATGAACTTA GTCGCAGTCA AAACTTCCTT CATAAGGATT TTGCAGTTCT CTATCGGACT 924 0 

AATGCCCAGT CCCGTACAAT TGAGGAAGCC CTGCTCAAGT CTAACATTCC TTATACCATG 9300 

GTTGGCGGAA CCAAATTCTA CAGCCGTAAG GAAATTCGCG ATATTATTGC TTATCTCAAC 93 60 

CTTATTGCTA ATTTGAGTGA CAATATTAGT TTTGAGCGTA TTATCAACGA GCCTAAACGT 942 0 

GGAATTGGTC TAGGTACAGT TGAGAAAATC CGTGATTTTG CAAATTTGCA AAATATGTCT 9480 

ATGCTGGATG CTTCTGCTAA TATTATGTTG TCTGGTATCA AGGGTAAGGC AGCCCAATCT 954 0 

ATCTGGGATT TTGCCAATAT GATGCTTGAT TTGCGGGAGC AGCTAGACCA CTTAAGCATT 960 0 

ACAGAGTTGG TTGAGTCCGT CCTAGAAAAA ACAGGTTATG TCGATATTCT TAACTCCCAA 9660 

GCGACTCTAG AAAGCAAGGC ACGGGTTGAA AATATCGAAG AGTTTCTTTC TGTTACGAAG 9720 

AACTTTGATG ACACCACGGA TGTGACAGAA GAGGAAACTG GTCTGGACAA ACTGAGTCGT 9780 

TTCTTAAATG ACTTGGCTTT GATTGCCGAC ACAGATTCAG GTAGTCAGGA GACATCAGAA 9840 

GTGACCTTGA TGACCCTGCA TGCTGCCAAA GGTCTCGAAT TTCCAGTTGT CTTTTTGATT 9900 

GGGATGGAAG AAAATGTCTT TCCACTTAGT CGTGCGACTG AAGATTCAGA TGAATTAGAA 9960 

GAAGAGCGCC GTCTAGCCTA TGTAGGTATC ACGCGTGCAG AGAAAATTCT CTATCTGACC 10020 

AATGCCAACT CACGCTTGCT TTTTGGTCGT ACCAATTATA ACCGTCCGAC TCGTTTTATT 10080 

AACGAAATCA GTTCAGACTT GCTTGAGTAT CAAGGTCTGG CTCGTCCTGC AAATACAAGC 10140 

TTTAAGGCAT C AT AT AG C AG TGGTAGTATT TCCTTTGGTC AAGGTATGAG TTTGGCTCAG 10200 

GCTCTTCAAG ACCGTAAACG CGGTGCTGCC CCAAAATCAA TCCAGTCAAG CGGTCTTCCA 10260 

TTTGGTCAAT TTACAGCTGG CGCAAAACCA GCATCTAGCG AGGCAAATTG GTCCATTGGT 10320 

GATATTGCTC TCCACAAGAA ATGGGGAGAG GGAACCGTTC TGGAAGTTTC AGGTAGCGGT 10380 

GCTAGGCAGG AATTGAAAAT CAATTTCCCA GAAGTAGGTT TGAAAAAACT TTTAGCCAGT 10440 

GTGGCTCCAA TTGAGAAAAA AATCTAATTT TCCATCCTTC TCACGAATAA TAAAGTGAGG 10500 
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AGGATTTTTA TGTACAGTAT TTCATTCCAA GAAGATTCAC TATTACCAAG AGAAAGGCTG 10 5 60 

GCCAAGGAAG GAGTTGAAGC GCTTAGTAAC CAAGAGTTGC TAGCTATTTT ACTCAGGACA 10 620 

GGAACACGTC AAGCTAGCGT TTTTGAAATT GCCCAAAAAG TCTTGAACAA TCTTTCAAGC 10 680 

CTAACGGATT TGAAAAAAAT GACCCTGCAG GAATTGCAGA GTTTGTCTGG TATTGGGCGT 10740 

GTTAAGGCCA TAGAATTACA AGCTATGATT GAACTGGGGC ATCGTATTCA CAAACACGAG 10 800 

ACTCTTGAAA TGGAAAGTAT TCTCAGCAGT CAAAAGTTGG CCAAGAAGAT GCAGCAGGAA 10860 

TTAGGGGATA AAAAACAAGA GCACCTGGTG GCACTCTATC TCAATACTCA AAATCAAATC 1092 0 

ATCCATCAGC AGACCATTTT TATCGGGTCT GTAACTCGTA GTATCGCTGA ACCGCGAGAG 10980 

ATTCTTCACT ATGCAATCAA GCATATGGCG ACTTCTCTTA TCTTGGTCCA CAATCATCCT 11040 

TCAGGAGCGG TAGCGCCTAG CCAAAATGAT GATCATGTCA CTAAACTTGT TAAAGAAGCC 11100 

TGCGAATTGA TGGGGATTGT TCTCTTGGAC CATTTGATTG TCTCTCATTC TAATTACTTT 11160 

AGTTATCGTG AAAAGACAGA TTTAATCTAA AGTTCATTAA CGACATAGTC AAAGAGTTTT 112 2 0 

TTATCTTTGG GACGATTTTC AAAAAGAAGT TCTGGATGCC ATTGGACACC GAGAAAGGCG 112 80 

ACATCATCCG TACTCATGAC AGCCTCAATG ATACCATCTT TAGGATCATG AGCCACAACT 11340 

TTTAAATTTG GTGCTAAGTC CTTGATGCTC TGGTGGTGGA AGGAGTTGAT ATGAGAGATT 114 00 

TCTCCATAGA TTTCTTGGAG AACGGTATCT GGTTCTGTTA CCAAGCGTTG AGTTGTGTAC 114 60 

TCAACAGAAG AATCCTGCCA ATGGTCTTCG ATATCTTGGT ACAAAGTTCC ACCCATGGCA 11520 

ACGTTAAAGA GTTGGGTACC ACGGCAGACA GAGAAAATGG GCTTTTTCTG TTTAATAGCT 11580 

TCCTTGATGA GGGCCAGTTC GAAGATATCT CTTTGAAGGT GATAGTCATC ACTATCAATG 11640 

GTTTTGGGTT CGCCATAAAA TTTTGGATCG ACATTTTGCC CACCTGTCAA GATGAGCTTG 11700 

TCAATCAAAC TGATATAGTG GCAGGCCATT TCTTGATCAC CAATCGGTAG GATGATGGGA 117 60 

ATCCCTCCAG CATCTTTAAC GCCTTCAACA AAGCCTTTTG CTGCGTAGCT CATCATGATG 1182 0 

TCATCATCTG GATGAGTTTT TTCGTTTCCT GTAATCCCAA TAACTGGTTT TTTCATAAAA 118 80 

TGATTTTCGC TTTCTAATCC TCTTTTCGCA TGAAGTAGAG GAGGGTTTGG AGTTCACTTG 1194 0 

TCAAATCGAC ATACTGAACG ACCACGTCTT TTGGTAAATG CAGATGGACT GGTGAAAAAC 1200 0 

TGAGAATTCC TTTCACACCA GCATCAACCA AGAGATTAGC AACCTCTTGT GACTTGACGC 12060 

TGGGAACAGT TAGGATAGCA GTCTTCACAT CAGCATCCTT GATTTTATCC TTGATCTGAG 12120 

AAATCCCGTA AATGGGAATC CCGTCAGGAG TTTGGGTACC GACTTCAGGA TGGTCGTCTA 12180 

GGTCAAAGGC CATGATAATC TTCATCTTGT TACGTTCGTG GAAGCGGTAG TGGAGAAGGG 122 4 0 
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CATGGCCCAT ATTTCCAATA CCAACCAGCA TGACATTGGT AATAGAGTTG TCATTGAGCA 123 00 

AATCGGCAAA AAATGTCATT AGTTTTTTGA CATCATAGCC AAAACCACGA CGACCAAGTT 123 60 

CACCAAAATA GGAAAAATCA CGACGTACGG TCGCTGAATC AATACCGATA GCCTCTGCAA 1242 0 

TTTGCTTAGA GTTGGCACGT TCAATCTTTT CTGCATGAAA TCTCTTAAAA ATTCGATAGT 124 80 

AGAGAGAGAG TCTTTTTGCT GTAGCTTTTG GAATAGCAAA CTGTTTATCT TTCACAAAAT 12540 

CACAACCTTT CTATTCTTCT ATTTTATAGA AACATTGTGA AAAAATCAAC AAAAATAAGA 12600 

AAAAACTAAG AAAAATCTTA GTTTTGATGT AAAAAATCTG CATGAGATAG AAAACGGTAG 12660 

AGGTCTCCGA CCAGCCCCTG ATAAACTTTT TTGCCCCTAA AAGTCAGAGA AGTCACATAA 12720 

AGTGTATCTG GTAAGGTTAC ACATCCTGAC AAAGTCAACA TGAGAGCCTC ATGATCCTCA 12780 

TACTTGAGAG TACGCTCTAC ATGATAGCAG TCCTTATAGG TCAGTTCAAA CATTTTGGCT 1284 0 

CTATCTTTCC GATTTTGTAA AGACACCACG TTCTACCAAG CTATCCATGA GGAAGTAGAA 12900 

TTTTTCCTGA TGAATATGGT GGTCTTCTGA TTTGAAAATA TCAACTAGAC GAAGGCCAAA 129 60 

CTTGTCAGTG ATATTGATTT TAGCCCCTGT AAGTTCCTTG TTAATGATGA TTTTGAGTTG 1302 0 

GAAGCCTTCA CCGCTGTTTG GCACTTTTTC CAAAAGGCGA GTCAGTTCAT AGTTACCAAC 1308 0 

CTTAGTTTCA AAAAAGGTGT TATCTTTGAG GGTGAATTTT TTAACAGAAG GGCTAAGAGT 1314 0 

GTAATCGTAA CGACAATTTT TTAACTGAAT GATTTTTTCA AATGCCATAT GGCTAACCTC 13200 

CGATAATTTC TTTTAAGGTT TTTGCGAGGG TTTGTAGGTC TTCAACGGTA TTTTGTGGCG 13260 

ACAAACTGAT GCGAAGGGAT TCCTTCAAGC GTTCTGAATT TGCGCCATAC ATGGCTTCAA 13320 

GAACATGGCT GGATTGGACA ACGCCTGCAG TACAGGCTGA GCCAGTAGAG ATTGAAATTC 13380 

CAGCTAAATC TAGCCGAAGG AGTAAGAGGT CATTTTTCTG ACCAGGAAAT CCAATATTGA 13440 

GAACATAAGG GAGATGATGT TTTCCTCTAT TCAGGTAATA CTGAATGCCC TCCAGCTCTG 13500 

CCAGAAAGGC AGTTTCTAGA TTTTGTACAT GTTGAAAATG TTCTTCTTGT TTTTCTAGGT 13560 

CTTCTTTTAG GGCTGCAACC ATGCCTACAA TGGCAGGCAG ATTTTCAGTT CCTGCACGTT 13 620 

TTTTCTGTTC CTGGTCTCCG CCATGTAGAT AGGAATCAAA GTCCATGCTA GATGCGTAGA 13680 

GAAAACCGAT TCCCTTAGGA CCATGGAATT TGTGGGCAGA AGCAGTGAGA AAATCAATGC 13 740 

CCAATTCTTC TGAATGAATT GGGATTTTAC CAATAGCCTG AACTGCATCA ACATGATAGG 13800 

CAGCAGGGTG TTGCTTGAGT ATTTGGCCAA TTTCAGCGAT GGGCAGTAGG TTTCCTGTCT 13860 

CATTATTGAC AAACATGGTA GAAACCAAAA TCGTATCGTC ACGTAAAGCC TTTTGAATTT 13 920 

GCTGGGCTGT GATTTCTTGA TTTTCTGGCT GGATAATGGT TGCTTCAAAC CCAAAGTGTT 13 980 

GAACCAAGTA ATCAATTGTT TCAAGGACAG CATGGTGCTC GATGGCAGTT GTGATGATAT 14040 
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GTTTTCCTTG TTCTTGGTGA CGAAGACAGT AGCCAATGAT GGTAGTATTA TTGCCTTCAG 14100 

TCCCACCAGA AGTGAAAAAG ATATGTTGAG GTTTTGTCCT TAGTAACTGG GCTAGTTCCT 1416 0 

GACGGGCTTC TCGCAAGAGT TTGCCAGCTT GACGACCATG ACCATGAATA CTAGAAGGAT 14220 

TTCCGTGGGT TTCTTGCATA ACCTTGGTCA TAGCTGAAAT AGCAACTGCT GACATAGGAG 14280 

TCGTTGCAGC ATTGTCCAAA TAAATCAAAG AATCACCTTA TTTCTTTTTA TTGTAGGCAA 14340 

AGAGTGGGCT GACTGGTTTT CTTTCGTGAA TACGGACGAT AGCATCACCA ATTAACTCAC 14400 

TAGCAGTGAT GTAGCATACA TTTTTAGGAG TTTTTTCTTT TGTTGCTACT GAATCAGTCA 144 60 

CAAGAATTTC TTTAATATTA GTATTGTCAA GAAGCTCAGC AGCTCCCTCG ACGAAGAGAC 14520 

CGTGGCTAGA AACAGCATAA ATTTCTGTAG CTCCTTCACG TTCAACGATT TTAGAAGCTT 1458 0 

CAGAGAAGGT ACGTCCTGTA TTTAAAATAT CATCAATCAA GATAGCTTTC TTACCTTCAA 14640 

CATCACCAAT AATATAACCT TCGTTACGAG TTGCATCGTC TTGAGGGTAG TCGATAATGG 14700 

CGATAGGAGC ATCAAGATAT TCAGCCAGGC TACGCGCACG TTTGACACCT GAATTTTTAG 14760 

GGCTAACGAC AACAACATCT GAACCAAGCA ATCCTTTATC G C AG T AATG T TTTGCGAATA 14820 

GGGGAACAGT GAAAAGATTA TCCACTGGAA TATCAAAGAA ACCTTGAACC TGAACGGCAT 14880 

GCAAATCAAG AGTCAGGATA CGATCAACTC CAGCCTTAAC CAGCATATTG GCAACTAGTT 14940 

TTGCTGTAAG TGGCTCACGA GGACAAGCAA TGCGGTCTTG ACGTGCATAG CCAAAATATG 13000 

GAAGGACAAC GTTGATACTG TGGGCACTTG CACGCACACA AGCATCGACC ATGATTAACA 15060 

ATTCCATTAG GTGGTTGTTG ACAGGGAAAC TTGTTGATTG GATGATGTAA ACATCATAAC 15120 

CACGCACACT TTCTTCGATA TTTACTTGGA TTTCTCCGTC TGAAAATTGA CGTGATGATA 15180 

GTTTTCCAAG TGGGACACCA ACAGCTTGGG CAATTTTTTG TGCAATCTCT TGGTTAGAGT 15240 

TGAGTGCGAA AAGTTTCATG TTTTTTCTAT CTGACATTAT AGACCGTCCT CTGTAAACTT 153 00 

TATAAATCCT AGTTATATTT ACCTTACATA TATGAACTGG GATTTGTGTA TTTTTATCTT 153 60 

TTCTATTTTA CCAAAAAATG GAGATTATTT CAGCTATTTT TCATACTTTT GACAAATCGA 15420 

ACCAATTTTG AAGGAGCTTT TTGATAGGAA ATCTGATTTT TCTCTAAAAA TTGTCGAAAA 15480 

TCCTGTTTGC CTTGCTCATG ATTTTCCACT TCAAGCTCCA ATTCGTAATC TGTTATATCA 15540 

AAGTATCGGC TCTGATCCAG TGCCATGAGA CCAATAGCTG TTTTCATTTC ATAGCGAAGC 15600 

GTTGTTAGAC AACCAAGAAC CTGCCAGTTC TTACTTTGGA TACCATGTTT CGCCAATTCA 15 660 

TCCAGTACTA GCCCTTGAGG AAGTTCTTCC TTACTCAGAT AGTTCTCAGC ATCTTTTAGT 15720 

TGCAATTTTT GGTTGTATTC CATGTTTCCA ACACTCTGCG GGACTTTGAG TGTCAACTCA 15780 
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GCCCAGTCTT CAAAGGTTCG AATGCGCATA GCGACTTTCT TTTCTCGCAG TTCAAAATCA 1584 0 

GGCGTGTCGA TGTAGTAATT TGTTTGAAGA ACAGGAGTGA CACCTGTGAA CTGGTCTTTT 15900 

AGACGATTGT ATTCATCTTT TTTCAATAGT GTTTTCAATT CAATTTCTAA ATGTTTCATT 159 60 

TTTCTTACCT TTTTTTATCG TTGAAAGCGG ATTTATGGTA TAATAAGCAT TGTATTTATT 16020 

GTATATGAAT CTGGAGAAAA AATCAAAGAT ATTTTTGACG GATAATATGA GAACAAGGGA 16080 

GAATATATGA CCTTAGAATG GGAAGAATTT CTAGATCCTT ACATTCAACC TGTTGGTGAG 16140 

TTAAAGATTA AACTTCGTGG TATTCGTAAG CAATATCGTA AGCAAAATAA GCATTCTCCA 16200 

ATTGAGTTTG TGACCGGTCG AGTCAAGCCA ATTGAGAGCA TCAAAGAAAA AATGGCTCGT 16260 

CGTGGCATTA CTTATGCGAC CTTGGAACAC GATTTGCAGG ATATTGCTGG CTTACGTGTG 16320 

ATGGTTCAGT TTGTAGATGA CGTCAAGGAA GTAGTGGATA TTTTGCACAA GCGTCAGGAT 163 80 

ATGCGAATCA TACAGGAGCG AGATTACATT ACTCATAGAA AAGCATCAGG CTATCGTTCC 1544 0 

TATCATGTGG TAGTAGAATA TACGGTTGAT ACCATCAATG GAGCTAAGAC TATTTTGGCA 16500 

GAAATTCAAA TTCGTACTTT GGCCATGAAT TTCTGGGCAA CGATAGAACA TTCTCTCAAC 16560 

TACAAGTACC AAGGGGATTT CCCAGATGAG ATTAAGAAGC GACTGGAAAT TACAGCTAGA 16620 

ATCGCCCATC AGTTGGATGA AGAAATGGGT GAAATTCGTG ATGATATCCA AGAAGCCCAG 1568 0 

GCACTTTTTG ATCCTTTGAG TAGAAAATTA AATGACGGTG TAGGAAACAG TGACGATACA 16740 

GATGAAGAAT ACAGGTAAAC GAATTGATCT GATAGCCAAT AGAAAACCGC AGAGTCAAAG 16800 

GGTTTTGTAT GAATTGCGAG ATCGTTTGAA GAGAAATCAG TTTATACTCA ATGATACCAA 15860 

TCCGGATATT GTCATTTCCA TTGGCGGGGA TGGTATGCTC TTGTCGGCCT TTCATAAGTA 16920 

CGAAAATCAG CTTGACAAGG TCCGCTTTAT CGGTCTTCAT ACTGGACATT TGGGCTTCTA 16980 

TACAGATTAT CGTGATTTTG AGTTGGACAA GCTAGTGACT AATTTGCAGC TAGATACTGG 17 040 

GGCAAGGGTT TCTTACCCTG TTCTGAATGT GAAGGTCTTT CTTGAAAATG GTGAAGTTAA 17100 

GATTTTCAGA GCACTCAACG AAGCCAGCAT CCGCAGGTCT GATCGAACCA TGGTGGCAGA 17160 

TATTGTAATA AATGGTGTTC CCTTTGAACG TTTTCGTGGA GACGGGCTAA CAGTTTCGAC 17220 

ACCGACTGGT AGTACTGCCT ATAACAAGTC TCTTGGCGGT GCTGTTTTAC ACCCTACCAT 17280 

TGAAGCTTTG CAATTAACGG AAATTGCCAG CCTTAATAAT CGTGTCTATC GAACACTGGG 17340 

CTCTTCCATT ATTGTGCCTA AGAAGGATAA GATTGAACTT ATTCCAACAA GAAACGATTA 17400 

TCATACTATT TCGGTTGACA ATAGCGTTTA TTCTTTCCGT AATATTGAGC GTATTGAGTA 17 460 

TCAAATCGAC CATCATAAGA TTCACTTTGT CGCGACTCCT AGCCATACCA GTTTCTGGAA 17 520 

CCGTGTTAAG GACGCCTTTA TCGGCGAGGT GGATGAATGA GGTTTGAATT T AT CG C AG AT 17 580 
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GAACATGTCA AGGTTAAGAC CTTCTTAAAA AAGCACGAGG TTTCTAAGGG ATTGCTGGCC IV 640 

AAGATTAAGT TTCGAGGTGG AGCTATTCTG GTCAATAATC AACCGCAAAA TGCAACGTAT 177 00 

CTATTGGACG TTGGAGACTA CGTTACCATT GACATTCCCG CTGAGAAAGG CTTTGAAACC 177 60 

TTGGAGGCTA TTGAGCTTCC ATTAGATATT CTCTATGAGG ATGACCACTT TCTAGTCTTG 17820 

AATAAACCCT ATGGAGTGGC TTCTATTCCT AGTGTCAATC ACTCTAATAC CATTGCCAAT 17 8 80 

TTTATCAAGG GTTACTATGT CAAGCAAAAT TATGAAAATC AGCAGGTTCA CATTGTTACC 17 940 

AGACTAGATA GGGATACTTG TGGCTTGATG CTCTTTGCCA AGCACGGTTA TGCCCATGCA 1800 0 

CGATTAGACA AGCAGTTGCA GAAGAAATCT ATCGAGAAAC GCTACTTTGC TTTGGTTAAG 180 60 

GGAGATGGAC ATTTGGAGCC AGAAGGGGAA ATTATTGCTC CGATTGCGCG TGATGAAGAT 1812 0 

TCCATTATTA CCAGACGAGT GGCTAAAGGC GGAAAGTATG CCCATACTTC ATACAAGATT 1818 0 

GTAGCTTCTT ATGGAAATAT TCACTTGGTC TATATTCACC TGCACACTGG TCGAACCCAT 18240 

CAAATCCGAG TCCATTTTTC TCATATCGGT TTTCCTTTGC TGGGAGATGA TTTGTATGGT 18300 

GGTAGTCTGG AAGATGGTAT TCAACGTCAG GCTCTGCATT GCCATTACCT ATCCTTTTAT 18360 

CATCCATTTT TAGAGCAAGA CTTGCAGTTA GAAAGTCCCT TGCCGGATGA TTTTAGTAAC 18420 

CTTATTACCC AGTTATCAAC TAATACTCTA TAAAAACTGT CTCAGAGTAT AATTATTATC 18480 

TTAAAGGAGA AAACTCATGG AAGTTTTTGA AAGTCTCAAA GCCAACCTTG TTGGTAAAAA 18540 

TGCTCGTATC GTTCTCCCTG AAGGGGAAGA GCCTCGTATT CTTCAAGCAA CAAAACGCTT 18600 

AGTAAAAGAA ACAGAAGTGA TTCCTGTTTT GCTTGGAAAT CCTGAAAAAA TTAAAATTTA 18660 

TCTTGAAATT GAAGGAATCA TGGATGGTTA TGAGGTCATC GACCCTCAAC ATTATCCTCA 18720 

ATTTGAAGAA ATGGTTTCTG CCTTGGTGGA GCGTCGCAAG GGCAAAATGA CTGAAGAAGA 18780 

TGTACGCAAG GTTTTGGTTG AAGATGTCAA CTACTTTGGT GTGATGTTGG TTTACTTGGG 18840 

CTTGGTTGAT GGAATGGTGT CAGGAGCGAT TCACTCAACA GCTTCAACAG TTCGCCCAGC 1890 0 

TCTACAAATC ATCAAAACTC GTCCAAATGT AACTCGTACT TCAGGAGCCT TCCTCATGGT 18960 

TCGTGGTACG GAACGTTACC TATTTGGAGA CTGTGCCATT AACATCAATC CAGATGCAGA 19020 

AGCCTTGGCT GAAATTGCCA TCAACTCAGC AATCACAGCT AAGATGTTTG GCATCGAACC 19080 

TAAAATTGCC ATGTTGAGCT ATTCTACTAA AGGTTCAGGG TTTGGTGAAA GCGTTGATAA 19140 

GGTCGTTGAA GCAACTAAAA TTGCTCACGA CTTGCGTCCT GACCTTGAAA TCGATGGTGA 19200 

GTTGCAATTT GATGCAGCCT TTGTTCCTGA AACTGCAGCT CTGAAAGCTC CTGGAAGTAC 19260 

GGTAGCTGGT CAAGCAAATG TCTTCATCTT CCCAGGTATC GAGGCAGGAA ATATTGGTTA 19320 
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CAAGATGGCT GAACGCCTGG GTGGCTTTGC GGCTGTAGGA CCTGTTTTGC AAGGTTTAAA 193 8 0 

CAAGCCAGTT AATGATCTTT CTCGTGGATG TAATGCAGAT GATGTTTACA AGTTGACCCT 1944 0 

CATCACAGCA GCTCAAGCAG TTCATCAATA GTGAAAACTA TAAAGTGATA TACTATGCTA 195 00 

TACTGTAGTT ATGAAACTAT GTACGAAAAG CACTGCCATT AATTCCTGAG AACTAAATTA 19560 

CTGATTGGTG TCAAAAAGGA AAACTTCCAA GCGATGATAT CCTGTCTATA CACGACCTAT 1962 0 

AGAAATCTGT AATATACATA TCCGTAAAAC GATAAATTCC CTTTTTGATT TTAAATGAGT 196 8 0 

ATGAAAAGAG AATTTTTTGG CTCTTTGTCA ACTGTAGTGG GTTGAAGAAA AGCTAAGCTC 1974 0 

GAGAAAGGAC AAATTTCATC CTTTCTTTTT TGATATTCAG AGCGATAAAA ATCCGTTTTT 19800 

TGAAGTTTTC AAAGTTCCGA AAACCAAAGG CATTGCGCTT GATAAGTTTG ATGAGATTAT 198 6 0 

TGGTCGCTTC CAGTTTGGCG TTAGAATAGT GTAGTTGAAG GGCGTTGATA ATCTTTTCTT 1992 0 

TATCTTTGAG GAAGGTTTTA AAGACAGTCT GAAAAATAGG ATGAACCTGC TTAAGATTGT 199 80 

CCTCAATAAG TCCGAAAAAT TTCTCTGGTT CCTTATTCTG GAAGTGAAAA AGCAAGAGTT 20040 

CATAGAGCTG ATAGTGGTGT TTCAAGTCTT CCGAATAGCT CAAAAGCTTG TTTAAAATCT 20100 

CTTTATTGGT TAAGTGCATA CGAAAAATAG GACGATAAAA TCGCTTATCA CTCAGTTTAC 20160 

GGCTATCCTG TTGAATGAGT TTCCAGTAGC GCTTGATAG 2 0199 
(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 19702 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

ACCCGATGTA TCAGCGGATA TTTACTCTAT TTTTCAAACG ATGTTATACC CACAATAAAA 60 

GAAAAAAGAC CCTAAGGTCT CCTTTGCTTT TATTATTAAA CGCGTTCAAC TTTACCTGAT 120 

TTCAAAGCAC GAGCTGAAGC CCAAACTTTT TTAGGTTTAC CATCGATAAG AACAGTAACT 180 

TTTTGAAGGT TTGGTTTTAC GGCACGTTTT GTTTGGTTCA TCGCGTGTGA ACGGTTGTTT 240 

CCTGATACAG TCTTACGACC TGTAAAGTAA CATACTTTAG CCATTGTGTT TTCCTCCTAT 300 

TAGATCTAAT ATAGCGGATG TGCTAGCACC ACATACCGTA CTATGTTATC ACATTTTCTT 360 

GTTTTTTGCA AGGGAATTGG AAGATTTTTT ATTTGTGTCT TAAATCAGGT CTTGCGTGAC 42 0 

ATTTCTGCTC TCCACATGCC ATCGTTGATT AACAGAACAC CAGAATTAAA ATTATGTGTA 480 

T AAAAAT CAT CTCTAACTGC AGCTAAGGGT ATAGCCGTCA AGTCCAAATC CCACAGCTCA 540 
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TCTATCGATT TTCTTACAAC AATATCTGAA TCCAAATACA GTACACGAGA CTCGCTTACA 600 

TACTTTGGAA TAAAATACCT AAAAAAGCCG CATATGAAAG TCCCTCAAAG GGGAGACGAT 6 60 

AACCTTTCAG AATATTACTG TCAATCTAAA CATTCACAAT CTCACTATTC AAAGTCTCTA 720 

GTCTTTTTTC CATCAATTGG AACCATTCTC GCGGAAGGTC ATCATTAAAA ACATAAAACT 7 80 

TAAGATTATA ATGATGAACA CAAAGAGATT TTATTGTTGT TTCAACTTTA TCCATATAAG 840 

CATTATCTGC ACCTAAGACA ATCGCTTTTT TCTCTTCTTT CACTTTTTAT CTCATTTCTT 900 

TTTATTCCCA TCATATTATT CCCATCATAT GTTTCCCATC ATATGTTTCT ACGTAACCAT 960 

TATTTTCGCC TATTCGTTCG TAAAACCATA CCAGTGGAGA TTTTAGATGA AGTCCCATTA 102 0 

CGGTTTACAA TTTTTACATT ACGACACGGA GTTTTACAAA TCGATTTCAT TTGCCAAACG 1080 

TAGTTAGTGA GGCAGTTAGC TAGTTCGCCA AATAGCGACT AGCGTCCAAC AATTTGGAAC 1140 

TTTAGTTCCA ATTGTTGGTA CTGAGTCACA TCTTCTCCTC TAACTCTACG TCTGGATACT 12 00 

TGTCCGCAAA CCAGCGGAGG GCAAAGTCAT TTTCAAAGAG AAAGACTGGT TGGTCAAAAC 12 6 0 

GGTCTTTGGC TAAGATATTG CGACTTGACG ACATCCGTTC ATCCAAGTCC TCAGGCTTGA 132 0 

TCCAACGAAC GGTCTTTTTA CCCATTGGGT TCATAACTAC TTCCGCATTG TACTCGCCTT 13 8 0 

CCATGCGGTG TTTAAAGACT TCAAACTGGA GTTGACCTAC AGCGCCTAGC ATGTACTCAC 144 0 

CTGTTTGGTA ATTCTTATAA AGCTGAACGG CTCCTTCTTG CACCAATTGC TCAATCCCCT 1500 

TGTGGAAGGA TTTTTGCTTC ATAACATTCT TAGCAGAAAC TTTCATGAAA ATCTCAGGTG 1560 

TAAAGGTTGG CAGGGGTTCA AATTCAAACT TGTTTTTTCC AACCGTCAAG GTATCCCCAA 162 0 

CCTGATAAGT ACCGGTATCG TAAACCCCGA TAATATCACC TGCCACGGCA TTGGTCACAT 168 0 

TCTCACGACT CTCCGCCATA AACTGGGTAA CATTAGATAG TTTAGCCCCC TTACCAGTAC 174 0 

GAGGGAGATT GACACTCATG CCGCGCTCAA ATTCGCCAGA TACGATACGG ACAAAGGCAA 1800 

TACGGTCACG GTGACGAGGG TCCATGTTGG CTTGGATTTT AAAGACAAAG CCTGAGAAAT 186 0 

CCTTGTCATA AGGATCCACA ATTTCACCGT CTGTTTTCTT GTGACCATGT GGTTCTGGAG 1920 

CAAACTTGAG GAAGGTTTCA AGGAAGGTCT GCACACCAAA GTTTGTCAGG GCTGAACCGA 1980 

AAAAGACAGG CGTCAATTCT CCAGCCAGAA TAGCTTCCTC TGAAAACTCA TTCCCGGCTT 2 040 

CATTTAAAAG CTCAATGTCA TCCTTGACTT GCTCGTAGAA AGGATTGCTA CCAAAGAGTT 2100 

TGTCCCCGTC TTCTAGACTG GCAAAACGCT CATCCCCTTT GTAAAGCTCT AAACGTTGGT 2160 

TATAGAGGTC ATACAAGCCC TCAAAGGCTT TCCCCATCCC GATAGGCCAG TTCATAGGGT 2220 

AGCTAGCAAT GCCCAAGATT TCTTCCAATT CTTGCAAGAG ATCCAAAGGC TCACGACCGT 2280 
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CACGGTCCAG CTTGTTCATA AAGGTAAAGA CTGGAATGCC ACGATGTTTC ACAACCTCAA 2 340 

ACAATTTCTT GGTTTGAGCC TCGATCCCCT TGGCAGAGTC CACGACCATG ACCGCAGCAT 2400 

CCACCGCCAT CAAGGTACGA TAGGTATCTT CTGAGAAGTC CTCGTGCCCT GGCGTGTCTA 24 60 

AGATATTCAC GCGCTTGCCG TCGTAGTCAA ATTGCATAAC AGATGAAGTA ACAGAAATCC 2 52 0 

CACGTTGCTT CTCGATATCC ATCCAGTCAG ATTTAGCAAA AGTCCCTGTT TTCTTCCCTT 2 580 

TTACCGTACC AGCCTCACGA ATCTCACCCC CAAAGTAGAG TAACTGCTCA GTGATGGTTG 2 64 0 

TTTTCCCCGC GTCCGGGTGG GAGATAATGG CAAAGGTACG ACGTTTCTTA ATTTCTTCTT 27 00 

G AAT ATT CAT AAGTTCTCTT TCTTTGATTC TCTATTTTTC TTGTTTCAAT AGCTGAGAAT 27 6 0 

GATTTTTACA TTGGATTTTA CCATTCCTTT CAACACTCCA TTATATCGGA TTTTAGCATT 2820 

TTTTTCAATT TCTATTTCTT TTCACTTCCC CCTCCCTTAT TTATAGGAAA ATATGGTAAA 2 880 

ATAGAACAGA CTAAAAATCA TCATTTCACG AAAGGATGCA AGATGAAAAT TACGCAAGAA 2940 

GAGGTAACAC ACGTTGCCAA TCTTTCAAAA TTAAGATTCT CTGAAGAAGA AACTGCTGCC 3 000 

TTTGCGACCA CCTTGTCTAA GATTGTTGAC ATGGTTGAAT TGCTGGGCGA AGTTGACACA 30 60 

ACTGGTGTCG CACCTACTAC GACTATGGCT GACCGCAAGA CTGTACTCCG CCCTGATGTG 3120 

GCCGAAGAAG GAATAGACCG TGATCGCTTG TTTAAAAACG TACCTGAAAA AGACAACTAC 3180 

TATATCAAGG TGCCAGCTAT CCTAGACAAT GGAGGAGATG CCTAATGACT TTTAACAATA 324 0 

AAACTATTGA AGAGTTGCAC AATCTCCTTG TCTCTAAGGA AATTTCTGCA ACAGAATTGA 3300 

CCCAAGCAAC ACTTGAAAAT ATCAAGTCTC GTGAGGAAGC CCTCAATTCA TTTGTCACCA 3360 

TCGCTGAGGA GCAAGCTCTT GTTCAAGCTA AAGCCATTGA TGAAGCTGGA ATTGATGCTG 3420 

ACAATGTCCT TTCAGGAATT CCACTTGCTG TTAAGGATAA CATCTCTACA GACGGTATTC 3480 

TCACAACTGC TGCCTCAAAA ATGCTCTACA ACTATGAGCC AATCTTTGAT GCGACAGCTG 3 54 0 

TTGCCAATGC AAAAACCAAG GGCATGATTG TCGTTGGAAA GACCAACATG GACGAATTTG 3600 

CTATGGGTGG TTCAGGTGAA ACTTCACACT ACGGAGCAAC TAAAAACGCT TGGAACCACA 3 660 

GCAAGGTTCC TGGTGGGTCA TCAAGTGGTT CTGCCGCAGC TGTAGCCTCA GGACAAGTTC 372 0 

GCTTGTCACT TGGTTCTGAT ACTGGTGGTT CCATCCGCCA ACCTGCTGCC TTCAACGGAA 3 780 

TCGTTGGTCT CAAACCAACC TACGGAACAG TTTCACGTTT CGGTCTCATT GCCTTTGGTA 3 840 

GCTCATTAGA CCAGATTGGA CCTTTTGCTC CTACTGTTAA GGAAAATGCC CTCTTGCTCA 3 900 

ACGCTATTGC CAGCGAAGAT GCTAAAGACT CTACTTCTGC TCCTGTCCGC ATCGCCGACT 3 9 60 

TTACTTCAAA AATCGGCCAA GACATCAAGG GTATGAAAAT CGCTTTGCCT AAGGAATACC 4020 

TAGGCGAAGG AATTGATCCA GAGGTTAAGG AAACAATCTT AAACGCGGCC AAACACTTTG 4080 
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AAAAATTGGG TGCTATCGTC GAAGAAGTCA GCCTTCCTCA CTCTAAATAC GGTGTTGCCG 4140 

TTTATTACAT CATCGCTTCA TCAGAAGCTT CATCAAACTT GCAACGCTTC GACGGTATCC 42 00 

GTTACGGCTA TCGCGCAGAA GATGCAACCA ACCTTGATGA AATCTATGTA AACAGCCGAA 42 SO 

GCCAAGGTTT TGGTGAAGAG GTAAAACGTC GTATCATGCT GGGTACTTTC AGTCTTTCAT 4320 

CAGGTTACTA TGATGCCTAC TACAAAAAGG CTGGTCAAGT CCGTACCCTC ATCATTCAAG 4 3 80 

ATTTCGAAAA AGTCTTCGCG GATTACGATT TGATTTTGGG TCCAACTGCT CCAAGTGTTG 4440 

CCTATGACTT GGATTCTCTC AACCATGACC CAGTTGCCAT GTACTTAGCC GACCTATTGA 4 500 

CCATACCTGT AAACTTGGCA GGACTGCCTG GAATTTCGAT TCCTGCTGGA TTCTCTCAAG 4 5 60 

GTCTACCTGT CGGACTCCAA TTGATTGGTC CCAAGTACTC TGAGGAAACC ATTTACCAAG 462 0 

CTGCTGCTGC TTTTGAAGCA ACAACAGACT ACCACAAACA ACAACCCGTG ATTTTTGGAG 4 680 

GTGACAACTA ATGAACTTTG AAACAGTCAT CGGACTTGAA GTCCACGTAG AGCTCAACAC 4740 

CAATTCAAAA ATCTTCTCAC CTACTTCTGC CCACTTTGGA AATGACCAAA ATGCCAACAC 4 800 

TAACGTGATT GACTGGTCTT TCCCAGGAGT TCTACCAGTT CTCAATAAAG GGGTTGTTGA 4 8 60 

TGCCGGTATC AAGGCTGCTC TTGCCCTCAA CATGGACATC CACAAAAAGA TGCACTTTGA 492 0 

CCGCAAGAAC TACTTCTATC CTGATAACCC CAAAGCCTAC CAAATTTCTC AGTTTGATGA 49 80 

ACCAATCGGA TATAATGGCT GGATTGAAGT CAAACTAGAA GACGGTACGA CCAAGAAAAT 5040 

CCGTATCGAA CGTGCCCACC TAGAGGAAGA CGCTGGTAAA AACACCCATG GTACAGATGG 5100 

CTACTCTTAT GTTGACCTCA ACCGCCAAGG GGTTCCCTTG ATTGAGATTG TATCTGAGGC 5160 

AGATATGCGT TCTCCTGAAG AAGCCTATGC TTATCTGACA GCCCTCAAGG AAGTTATCCA 522 0 

GTACGCTGGC ATTTCTGACG TTAAGATGGA GGAAGGTTCG ATGCGTGTGG ATGCCAACAT 52 8 0 

CTCCCTTCGT CCTTATGGTC AAGAGAAATT CGGTACCAAG ACTGAATTGA AGAACCTCAA 534 0 

CTCCTTCTCA AACGTTCGTA AAGGTCTTGA ATACGAAGTC CAACGCCAGG CTGAAATTCT 54 00 

TCGCTCAGGT GGTCAAATCC GCCAAGAAAC ACGCCGTTAC GATGAAGCGA ATAAAGCAAC 54 6 0 

CATCCTCATG CGTGTCAAGG AAGGGGCTGC TGACTACCGC TACTTCCCAG AACCAGACCT 552 0 

ACCCCTCTTT GAAATTTCTG ACGAGTGGAT TGAGGAAATG CGGACTGAGT TGCCAGAGTT 55 8 0 

TCCAAAAGAA CGTCGTGCGC GTTATGTATC TGACCTTGGT TTATCAGACT ACGATGCTAG 5640 

TCAGTTGACT GCTAATAAAG TCACTTCTGA CTTCTTTGAA AAAGCTGTTG CCCTAGGTGG 5700 

TGATGCCAAA CAAGTCTCTA ACTGGCTCCA AGGGGAAGTC GCTCAGTTCT TGAATGCTGA 5760 

AGGTAAAACA CTGGAACAAA TCGAATTGAC ACCAGAAAAC TTGGTTGAAA TGATTGCCAT 582 0 
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CATCGAAGAC GGTACTATTT CATCTAAGAT TGCCAAGAAA GTCTTTGTCC ATCTAGCTAA 5880 

AAATGGCGGT GGCGCGCGTG AATACGTGGA AAAAGCAGGT ATGGTTCAAA TTTCAGATCC 5940 

AGCTATCTTG ATCCCAATCA TCCACCAAGT CTTTGCCGAT AACGAAGCTG CTGTTGCCGA 6000 

CTTCAAGTCA GGCAAACGTA ACGCCGACAA GGCtTTACAG GATTCCTTAT GAAGGCAACC 6060 

AAAGGCCAAG CCAACCCACA AGTTGCCCTT AAACTACTTG CACAGGAATT GGCGAAGTTG 612 0 

AAAGAAAACT AGACAGAACA AAACCAGCCC TAAGGTTGGT TTTTTCTTCT CTACCAACTC 6180 

CCAATAACTA TTTTGGCTTT ATTTCCAGAG TATTTTATGG TAAAATGAAG AGTAATAATA 6240 

TTTATTAAAG AGGTAAAAAC ATGATTGAAG CAAGTACCTT AAAAGCTGGT ATGACCTTTG 6300 

AAACAGCTGA CGGCAAATTG ATTCGCGTTT TGGAAGCTAG TCACCACAAA CCAGGTAAAG 63 60 

GAAACACGAT CATGCGTATG AAATTGCGTG ATGTCCGTAC TGGTTCTACA TTTGACACAA 642 0 

GCTACCGTCC AGAGGAAAAA TTTGAACAAG CTATTATCGA GACTGTCCCA GCTCAATACT 6480 

TGTACAAAAT GGATGACACA GCATACTTCA TGAATACAGA AACTTATGAC CAATACGAAA 654 0 

TCCCTGTAGT CAATGTTGAA AACGAATTGC TTTACATCCT TGAAAACTCT GATGTGAAAA 6600 

TCCAATTCTA CGGAACTGAA GTGATCGGTG TCACCGTTCC TACTACTGTT GAGTTGACAG 6660 

TTGCTGAAAC TCAACCATCT ATCAAAGGTG CTACTGTTAC AGGTTCTGGT AAACCAGCAA 672 0 

CGATGGAAAC TGGACTTGTC GTAAACGTTC CAGACTTCAT CGAAGCAGGA CAAAAACTCG 6780 

TTATCAACAC TGCAGAAGGA ACTTACGTTT CTCGTGCCTA ATCTCTAGAA AGAGGTCATT 6840 

CTATGGGAAT TGAAGAACAA CTTGGCGAAA TCGTTATCGC CCCACGTGTA CTTGAAAAAA 69 00 

TCATTGCTAT CGCTACTGCA AAGGTAGAGG GTGTTCACTC TTTTTCAAAC AGATCAGTGT 6960 

CTGATACCCT TTCAAAACTT TCACTCGGCC GTGGCATTTA TCTTAAAAAC GTGGACGAAG 702 0 

AACTCACAGC AGATATCTAT CTCTACCTTG AGTACGGAGT AAAAGTTCCT AAGGTAGCGG 7080 

TTGCTATCCA GAAAGCTGTC AAAGATGCCG TCCGTAATAT GGCTGATGTA GAACTCGCTG 7140 

CTATCAATAT TCACGTTGCA GGTATCGTCC CAGATAAAAC ACCAAAACCA GAATTGAAAG 7200 

ATCTATTTGA CGAGGACTTC CTCAATGACT AGTCCACTAT TAGAATCTAG ACGCCAACTC 7260 

CGTAAATGCG CTTTTCAAGC TCTCATGAGC CTTGAGTTCG GTACGGATGT CGAAACTGCT 7320 

TGTCGTTTCG CCTATACTCA TGATCGTGAA GATACGGATG TACAACTTCC AGCCTTTTTG 7380 

ATAGACCTCG TTTCTGGTGT TCAAGCTAAA AAGGAAGAAC TAGATAAGCA AATCACTCAG 744 0 

CATTTAAAAG CAGGTTGGAC CATTGAACGC TTAACGCTCG TGGAGAGAAA CCTCCTTCGC 7500 

TTGGGAGTCT TTGAAATCAC TTCATTTGAC ACTCCTCAGC TGGTTGCTGT TAATGAAGCT 7 560 

ATCGAGCTTG CAAAGGACTT CTCCGATCAA AAATCTGCCC GTTTTATCAA TGGACTGCTC 7 620 
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AGCCAGTTTG TAACAGAAGA ACAATAAGGC TCTTTGTCAA CTGTAGTGGG TTGAAAAAAA 7 680 

GCTAAGCTCG AGAAAGGACA AATTTCGTCC TTTCTTTTTT GATGTTCAAA GCGATAAAAA 7740 

TCCGTTTTTT GAAGTTTTCA AAGTTTCGAA AACCAAAGGC ATTGCGCTTG ATAAGTTTGA 7800 

TGAGATTATT GGTCGCTTCC AGTTTGGCAT TAGAATAGTG TAGTTGAAGG GCGTTGACAA 7 860 

TCTTTTCTTT ATCTTTGAGG AAGGTTTTAA AGACAGTCTG AAAAATAGGA TGAGCCTGCT 7 92 0 

TAAGATTGTC CTCAATAAGT CCGAAAAATT TCTCTGGTTC CTTATTCTGG AAGTGAAACA 7 9 BO 

GCAAGAGCTG ATAGAGCTGA TAGTGGTGTT TCAAGTCTTG TGAATGGCTC AAAAGCTTGT 8040 

CTAAAATCTC TTTATTGGTT AAGTGCATAC GAAAAGTAGG ACGATAAAAT CGCTTATCAC 8100 

TCAGTCTACG GCTATCCTGT TGAATGAGTT TCCAGTAGCG CTTGATATCC TTGTATTCAT 8160 

GGGATTTTCG ATGAAACTGA TTCATGATTT GGACACGCAC ACGACTCATG GCACGGCTAA 8220 

GATGTTGTAC AATGTGAAAG CGATCAAGAA CGATTTTAGC ATTCGGGAGT GAAACAGTCT 82 80 

GGGAGACTGT TTCAGCCTGA GCCTAGGAAT TTGAAAGCGA AGCTGTTTAG CCAAGTCATA 83 4 0 

GTAAGGGCTA AACATATCCA TAGTAATAAT TTTGACGCGA CATCGGACAA CTCTATCGTA 84 00 

GCGAAGAAAG TGATTTCGAA TGATAGCTTG TGTTCTACCC TCAAGAACAG TGATGATATT 84 60 

GAGATTGTTA AAATCTTGCG CAATGAAGCT CATCTTTCCC TTTGTAAAAG CATACTCATC 852 0 

CCAAGACATA ATCTCAGGAA GACAAGAAAA ATCATGTTTA AAGTGAAAAT CATTGAGCTT 8580 

ACGAATAACA GTTGAAGTTG AGATGGAAAG CTGATGGGCA ATATCAGTCA TAGAAATCTT 8640 

TTCAATCAAC TTTTGAGCAA TCTTTTGGTT GATGATACGA GGGATTTGGT GATTTTTCTT 8700 

GACGATAGAA GTTTCAGCGA CCATCATTTT TGAACAGTGA TAGCACTTGA ATCGACGCTT 8760 

TCTAAGGAGA ATTCTAGTAG GCATACCAGT CGTTTCAAGA TAAGGAATTT TAGAAGGTTT 8 82 0 

TTGAAAGTCA TATTTCTTCA ATTGGTTTCC GCACTCAGGG CAAGATGGGG CGTCGTAGTC 8880 

CAGTTTGGCG ATGATTTCCT TGTGTGTATC CTTATTGATG ATGTCTAAAA TCTGGATATT 8940 

AGGGTCTTTA ATGTCTAGTA ATTTTGTGAT AAAATGTAAT TGTTCCATAT GAATCTTTCT 9 000 

AATGAGTTGT TTTGTCGCTT TTCATTATAG GTCATATGGG ACTTTTTTTC TACAATAAAA 906 0 

TAGGCTCCAT AATATCTATA GGGGATTTAC CCACTACAAA TATTATAGAG CCAACAATAA 912 0 

AAAGAAAAAG TGTTTGATAG ATATCAAACA CTTTTTTCTT TGCCTCCCAC TATCTAAAAA 9180 

AATGATAATA GATATAATTG TAAACAAAAA TCCAGATAGG TTTTGCATGA TTGAGAAAGT 9240 

TAAAAAAACT ATGGCAGAGA ATCGTTAATC TCAGATTGTC GGTAGAACGA TAAACAAGGG 9300 

CAAAAAAGAA ACCAATCAGA CTATAATATA ATAAACTAAT TGGATCTCTG TGAGATAGTA 93 6 0 
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TCAAATGGCT AATCCCAAAG ATGATAGCAG ATAGGATAAC ATCCAAATAG TACTTGGACT 9420 

AGGGAAAGAA GGTATTCATA AAATACCCTC TATCAAGAGT CTCCTCAAAA ACAGGACCGA 9480 

TGATTACAGG CAGGACAAAA GATAAGATAG TCGATAAAAA GGTTGGTTGT CCATTTGAAA 9 540 

AAAGCACGGT AAAATACTCA T C ATGAATAT TCCTATGATT AATCAAATGA GCATAGCGTG 9 600 

CCCAAAAATT ACCGAGAATC TGATAAACCA CATAAGTTGC AAATAAGTAG AAGACAAATG 9660 

ACCAGTTCCA GCTCTTTTTC TCAAAGATAA AGAGCATCTT TTTCTTTTTT AACCTCCAAA 9720 

TTAATAGAAG GAAACTTCCC ACTAATCCCA TTGTTAAAAT AAGAGAATAG ACATCAGCTC 9780 

CTAACCCTAA AATGATCGTC ACATACAATC CAATTGTTTG TGGTAAATAG GTAGATAGTA 9 840 

AAATAATAAG CAAAAATATT CCAAATTGTC TTAGTTTTTT TGTGTTTCTC ATCGTACTTT 9900 

TTTGAAAGAT TACCCTGCTC GGAAGCCGTA CTTCCAAGCA TCTATATAAG AATTAAGTGC 9960 

CCCTTGCCTC ATATAGGGAG CAAATTCTCT ATAATATAAC CATCTACTAT ATCCATCTTC 10020 

CCAAACAGCA AGACCACCTG AAGTTTGCTC CAAGTCCTCA GTTGAAAGAA CTGTAAATGT 10080 

ATTTGTACCT GTCATTGCAA GTACCTTCTT AAAATAGATT GTTGTAGGCT CACATTTATA 10140 

GTATATTTCT TTTTTTGTCT ATTTTATAGC CCATCTCCTC AACTGGCAAT TTTTCGACCT 10200 

GAATTACATT TTTCCATAAA AAATGAGACC TTTCTAGTCT CATTTAGTCA TTCTTAGTAT 102 60 

TTTCTAAATC GTTGATAGCG TTCTTCCAGC AACTCTTCTA GCGGTTTTTG TGAAAGTCTA 103 20 

GCCAGCTCCG TTTGGAGTTC TTTTTTGACA CTCTTAATCA GTTCTTTACT AGAAAGTCCT 103 80 

ATTTCAGAAA TCACCTTATC CACCACGTCC ATTTCTAACA GTTCATGCGA AGTGATTTTC 10440 

ATCAGTTCTG CTGCTTCCAT AGCGCGAGTA CCGTCCTTCC ATAAAATGGA AGCAAAGCCT 105 00 

TCTGGACTGA GAATGGCATA GATAGAATTT TCCAGCATCC AGACACGGTC CGCGACAGCT 105 60 

AGAGCCAGAG CCCCGCCTGA ACCACCTTCA CCGATAATAA TGGCGATAAT AGGAACTTTC 10 620 

AGGTCACTCA TTTCCATGAG ATTGCGAGCG ATAGCTTCCC CTTGACCACG TTCTTCCGCT 106 80 

CCGACACCAG GATAAGCACC TGCTGTATTG ATAAAGGTCA CAACTGGACG GCCAAATTTC 1074 0 

TCAGCCTGTT TCATCAACCG CAGTGCCTTT CGGTAGCCTT CTGGATGTGG TTGGCCAAAA 10800 

TTCCGTTTGA GGTTGTCTTG CAAACTCTTG CCTTTTTGGA TACCAACCAC TGTTACAGCT 10860 

TGGTCTCCAA GCCAACCAAT ACCACCAACA ACTGCACCAT CATCACGAAA AGAACGGTCA 1092 0 

CCATGTAATT GGATAAATTC ATCAAAAATG CCTGTCGCAA AGTCCAAGGT TGTCAAGCGA 10980 

CTCTGCTCAC GCGCTTCTCT GACTATTTTT GCAATATTCA TCTAGGACTC CCTCCATGCA 11040 

ATCTGACTAG GCTAGCAATC GTATCTGGTA AGTCTCTTCT TTTGACAATA GCATCCACAA 11100 

AGCCATGTTC TAATAGGAAT TCTGCCTTTT GGAAATCCTC AGGCAAGCTT TCACGAACCG 11160 
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TATTTTCAAT CACACGACGC CCAGCAAAAC CAACCAAGCT CTGTGGTTCA GCCAGAATGA 11220 

TATCGCCTTC CATAGCGAAA GAAGCTGTCA CACCACCAGT CGTTGGATCT GTCAAAATGG 112 80 

TCAGGTAAAA GAGACCAGCA TTTGAATGGC GTTTAACCGC CGCAGAGATC TTAGCCATCT 11340 

GCATGAGACT CATGATTCCT TCCTGCATAC GGGCTCCACC AGAGGCTGTG AATAGGACAA 11400 

CTGGCAATTT TTCGACAGTC GCATACTCAA ACAAACGAGT GATTTTTTCA CCTACAACCG 114 60 

TACCCATAGA AGCCATGATA AAGTTAGAAT CCATAATCCC AAGAGCCACA GTCTGACCTT 11520 

TAATAAGAGC AGTTCCTGTC ACAACGGCTT CATGCAGACC TGTTTTTTCA CGCATAGATG 11580 

CCAGTTTCTT TTGGTAACCA GGGAAATGCA AGGGATCCTT GCTTTCAATC CCTGTAAACA 11640 

ATTCTTTGAA GGTTCCCATA TCAATCGTCA AAGCCAAGCG TTCTTGGGCA GAAATACGAA 117 00 

AGGTATAGCT ACAGTGCGGA CAGATACGTT CACTTCCCAG ATCCTTCTGA TAGATGGTAT 117 60 

GCTTACAGCC TGGACACTGG GAAAATAATT CATCTGGAAC CTCTGGCTTA GCTTGAGGTT 11820 

TTTCCCTAAC CGAACGATTG GGATTGATTC GAATATACTT ATCTTTTTTA CTAAATAGAG 11880 

CCATTGATTC CCCTTTTCGG TTTAAACTCT TAAAGTCATT TTATTCTTTT TCTTGATATT 11940 

TAGGTAAGAA GGTTTCCATC AAGAAGGAAG TATCATAATC CCCAGCAATG ACATTGCGAT 12 000 

CTGAAATGAG GTCAAGCTGG AAATCTGCAT TGGTCTGCAC TCCTTCAATT TCTAATTCAT 12 060 

AGAGGGCACG TTGCATTTTC ATCAAGGCGT CAAAACGATT TTCGCCGTGT ACTATGATTT 1212 0 

TGGCAATCAT ACTATCATAA TAAGGCGGAA TGGTATAACC TGGATAAACT GCTGAATCCA 12180 

CGCGCAAGCC AACTCCACCA CTTGGCAGAT AGAGATTAGT AATCTTACCT GGACTTGGAG 12240 

CAAAGTTAAA GGCTGGGTTT TCTGCATTGA TACGACACTC GATGGCATGA CCGCGTAGGA 12 3 00 

CAATATCTTC TTGCTTAACA GACAAAGGCT GACCTGCCGC AATGCAAATC TGTTCCTTAA 123 60 

CGATATCAAC ACCTGAAACA AACTCTGTTA CTGGATGTTC TACCTGAACA CGAGTATTCA 124 2 0 

TCTCCATGAA ATAGAAATTG CTACTTGCTT CATCAAGAAG AAATTCAATG GTTCCTGCAT 12480 

TCTCATAGCC AACAAACTCT GCCGCTCGAA CAGCAGCAGC ACCTATTTCA TGACGCAGCG 12540 

TTTTTCCGAT TGCAATCGAG GGACTTTCTT CCAAAACCTT TTGGTTATTC CTTTGAAGAG 12600 

AACAATCCCG TTCACCCAAG TGAATCACAT GTCCATGCTC ATCACCTAGG ATTTGAACCT 12 660 

CAATGTGCCG AGCTGGATAG ATAACCCGTT CTATGTACAT GGCACCATTG CCATAATTGG 12720 

CCTTGGCCTC ACTAGAGGCA GTTTCAAAGG CAGAAACGAG GTCATCTGGT TTTTCAACCT 12780 

TACGAATCCC TTTACCACCT CCACCTGCTG AAGCCTTGAG CATAACAGGA TAGCCAATTT 12840 

TTTCAGCAAC AATCAAAGCT TCTTCAGAGT TATGCACTTC TCCATCTGAA CCTGGTATAA 12 900 
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CAGGCACACC TGCTTTAATC ATCTGAGCAC GCGCATTGAT CTTATCCCCC AT CATATCCA 12 9 60 

TAACATGACC AGATGGACCG ATAAACTTGA TACCTACTTC TTCACACATG GTCGCAAATT 13020 

TGGAATTTTC ACTGAGAAAT CCAAAACCAG GGTGAATAGC TTCTGCCTCA GTCAAGACTG 13 0 80 

CAGCTGATAG AACTGCATTA ATATTGAGAT AAGACTCTGT TGCCTTGCCA GGACCAATAC 13140 

AAACTGCTTC ATCTGCCAAA AGCGTATGAA GAGCTTCCTT ATCAGCAGTT GAATAAACCG 132 00 

CTACCGTCGC AATCCCCAAT TCACGTGCCG CACGGATAAT ACGAACCGCA ATTTCACCAC 132 60 

GATTGGCAAT TAAAATTTTT CGAAACATGG AGAACCTCCT TAGTTCCCAA TTGCAAAAGT 13320 

AAGGGTACCA CTGGCTGCAA GCTTGCCATC CACTTCAGCC TTTGCTTCAA CCACAGCTAT 133 80 

GGTGCCACGA CGTTTTACAA AAGTCGCTGT CATAACCAAT TGGTCGCCTG GTACAACTTG 13440 

CTTCTTGAAC TTAACCTTGT CCATACCAGC GTAAAAGACC AGTTTTCCTT TATTTTCAGG 13 500 

TTTTGATAAC TCCAACACAC CGGCAGTTTG CGCCAAGGCT TCCATAATCA CAACACCTGG 13 5 60 

CATAACTGGG TATTGAGGAA AGTGGCCGTT AAAGAAAGGC TCGTTGATGG TCACATTTTT 13 62 0 

GATAGCAACA ATGGTATCCT CGCTCACTTC CAAGACACGG TCCACTAGAA GCATAGGATA 13 6 80 

ACGGTGGGGA AGAGCTTCTT TGATTCCTTG AATATCGATC ATTTGATACG TACCAATCCT 137 40 

TTACCAAACT CAACCATTTC TTCGTTAGAG ACGAGAATTT CCGTTACCAC ACCATCCTTA 13 800 

GGAGCTGGGA TTTCATTCAT GACTTTCATG GCTTCGATAA TTACCAATGT TTGACCTTTT 13860 

TTGACACTAT CACCAACTGT AACGAAGGCA GGTTTATCTG GTCCAGCAGC CAAGTAAACC 13920 

ACTCCAACAA GTGGACTCTC TACAAGATTT CCCTCAGTAG CCACACTTGC TTCAGCTGGA 139 80 

GCTGGAACTT CTTCTGCTAC AGTCTCTGCT GGAGCAGATG TAGGAGCTAC TGGACTCGGT 14040 

GTTGCTAGAA CGGGTGCTGG AGCGACTTGA GTTGCAACTT CAGGCACAGG TCTTGCTTCA 14100 

TTCTTGCTAA ACTGCAACTC ATCCGTCCCA TTTTTATAAG AAAATTCTCT CAAACTTGAC 14160 

TGGTCAAATT GAGTCATCAA GTCTTTAATA TCGTTTAAAT TCATACTTAT CTATTCTCCC 142 2 0 

AACGTTTGAA AGCAAGAACT GCATTGTGGC CTCCAAAACC AAAAGTATTT GAAATAGCGT 142 80 

ATGGAATTTC TTTCTCCAAG CCTTGTCCAT AAACGAC AT T AGCTTCGATA TAATCTGATA 1434 0 

CTTCACTTGT CCCAGCTGTC ATTGGTACAA AGTTATGACG CATAGCTTCG ATGGTGACGA 14400 

TAGCTTCTAC TGCACCCGCA GCCCCCAGCA AATGTCCTGT AAAAGACTTG GTTGATGATA 14460 

CAGGTACTTC CTTACCAAGA ACAGCTACGA TAGCACCACT TTCTCCTTTT TCATTGGCAG 14520 

GAGTTGACGT TCCGTGAGCA TTGACATAGG CTACTTGCTC TGGAGAAATC TCAGCTTCTT 14580 

CCAAGGCTAG TTTGATGGCC TTGATAGCTC CCTGACCTTC TGGATGTGGA GAAGTCATGT 14640 

GGTAGGCATC ACAAGTATTT CCGTAACCAA CCACTTCAGC CAGGATAGTA GCTCCACGTT 147 00 
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TTTCAGCGTG TTCAAGACTT TCTAGAACCA ACATCCCTGA ACCTTCACCC ATAACAAACC 147 60 

CATTGCGATC CTTATCAAAT GGGATCGAAG CACGAGTTGG ATCCTCTGTA GTAGAGAGAG 14820 

CTGTTAAGGC TTGGAAACCA GCGATGGCAA AAGGTGTGAT AGAAGCTTCT GTTCCTCCCA 14880 

CCAACATCAC ATCTTGGAAA CCAAACTTAA TGGAGCGGAA GGCATCCCCA ATCGCATCAT 1494 0 

TTGATGAAGA GCAGGCAGTA TTGATAGATT TACAAACACC GTTTGCACCA AAACGCATGG 15000 

CTACATTCCC AGAAGCCATA TTTGGTAAAG CTTTTGGAAG AGTCATTGGT TTGACACGTT 15060 

TGGGTCCTTT TTCATGAAGG CGAAGTACCT GATCTTCAAT TTCCTTGATT CCACCAATAC 15120 

CAGATGCAAC GATAACACCA AAACGATCCC TATTAAGAGC CTCTACATCA AGATTGGCAT 15180 

GATTTACAGC CTCTTGGGCT GCATACAAGG CATATAAAGA ATAGTTATCA AAACGGTTGG 15240 

TATCTTTTTT TACAAAGTAT TTATCGAACG GAAAATCTTG GATTTCTGCC GCATTATGCA 15300 

CATCAAAGTC ACTATGATCA AATTTTGTAA TGCCACCAAT GCCGATTTTC CCAGTTGCTA 15360 

AACTATTCCA AAATTCTTCT GGTGTATTTC CGATTGGAGA TGTTACTCCA TAACCTGTTA 15420 

CCACTACTCG ATTTAGTTTC ATTCTTTTCA CCTCTAGCTT TCGCTACATA CTTAAGCCAC 15480 

CATCAATGGC AACCACTTGT CCAGTTAGAT AATCTTGGCC TGCTAAAAAT ACTGTCAAAT 15 540 

CTGCAACCTG CTCTGCCTGC CCAAATTCTT TCATCGGAAT CTGAGCTAGT GTAGCTTCCT 15600 

TAATCTTATC TGACAGGATA GCGGTCATAT CAGACTCAAT CATTCCTGGA GCAATCACAT 15660 

TGACTCGTAT ATTCCGACTA GCGACCTCGC GTGCCACAGA CTTGGTAAAG CCAATCAAGC 15720 

CAGCCTTAGA AGCAGCATAA TTAGCTTGAC CAATATTCCC CATCAAACCA ACAACACTAG 15780 

ACATATTAAT GATAGCACCT TCTCTGGCTT TCATCATCGG TTTCAAGACT GATTGTGTCA 15840 

TATTAAAGGC ACCAGTCAGA TTGACCTTGA GCACTTTTTC AAAATCTGCT TCTGTCATCT 15 900 

TGAGCATAAG AGTATCTTGG GTAATCCCTG CATTGTTGAC CAAAACATCT ACTGAACCCA 15960 

GTTCTGCAAT AGCTTGATCA ATCATACGCT TAGCGTCTGC AAAATCTGAT ACATCTCCTG 16020 

AAATGGGAAC CACCTTGATA CCATAGTTTG AAAACTCAGC GAGCAATTCT TCTGAGATTG 16080 

CCCCACGACT GTTTAAGACA ATGTTGGCTC CTGCTTGAGC AAACTTGTGG GCGATGGCAA 16140 

GACCAATTCC ACGACTCGAA CCTGTAATAA AGATATTTTT ATGTTCTAGT TTCATTTTTT 162 00 

TCCTTTCAAA ACTTCTACTT ATTTTAGTCT ATTTTTCTAA AAGTGCTACT AAACTCGCTT 162 60 

GATCTTCCAC ATGAGCTAAG TGAGCAGTTT GATCAATTTT TTTAACAAAA CCTGACAAGA 16320 

CTTTCCCCGG TCCAATCTCG ATAAAGTTGC TTATGCCTGC TTCTTGCATG ACCCCAATAC 163 80 

TTTCATAGAA ACGAACGGGT TCCTTGACCT GACGCGTCAA GAGCTGAGCA ATGTCCTCTT 16440 
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TTTGCATCAC AGCAGCTTCT GTATTGCCGA CTAGGGGACA AGTAAAATCT GAAAAACTTA 16500 

CCTGAGCTAG AGTTTCAGCT AGTTTCTGGC TAGCAGGTTC AAGGAGAGCG GTGTGAAAGG 16560 

GACCTGACAC CTTAAGAGGA ATCAAGCGTT TGGCACCTGC TTCTTGCAAA AGTTCAACCG 1662 0 

CTCGATCAAC TGCAACCACT TCTCCAGCAA TGACGATTTG TGCAGGTGTG TTATAGTTGG 166BO 

CTGGAGTAAC CACTCCAAGT TCAGAAGCTT TTTGACAGGC TTCTTCAATG ACCTCTACTG 16740 

GCGTATTGAG AACTGCTACC ATCTTGCCAG AGTCAGCAGG AGCCGCTTCT TCCATATAGG 16800 

CTCCACGCTT AGCTACCAAG GCAACCGCAT CTTCAAAATC CAAGGCGCCA CTTGCCACCA 168 60 

AGGCAGAGTA TTCTCCAAGA GACAAACCAG CAACCATATC AGGCTGATAG CCCTTTTCTT 16920 

GCAATAAACG GTAGATAGCA ACCGAAGTCG CTAGAATGGC TGGTTGCGTA TAGCGGGTCT 16980 

GATTGAGTTT GTCTTCTTCC GTATCGATGA GATAACGCAA ATCATAACCG AGCACCTGGC 1704 0 

TCGCTCGATC AATCGTTTCT TTAACAATCG GATACTGATC ATAGAAATCC CGTCCCATCC 17100 

CTAGATACTG GGCACCTTGA CCAGCAAATA AAAAGGCTGT TTTAGTCATT TCTTACAACT 17160 

CCTGTCCAGC GAGAGGCTTC TTCTTGAATT TTCTTAGCGG CTCCGTAATA CAAATCTTTT 1722 0 

AGGATTTCTT CAGCTGTTTC TTCTTTAGAA ACAAGCCCTG CGATTTGACC TGCCATAACA 172 80 

GAGCCACCAT CCACATCACC GTGAACAACT GCTTTGGCTA GAGCACCTGC TCCCATTTGT 17 340 

TCAAAGATTT CTAAATCAGG ATCTTCTTGC TTAAAGGCAT CTTTTTCAGC CAGTTCAAAA 17400 

TCTCTAGTCA ACTGATTTTT AATAGCACGA ACAGCATGAC CAAAGTGCTG AGCTGAAATC 17460 

GTAGTATCAA TATCCCTTGC TTTTAAAATT TTCTCCTTGT AGTTTGGATG GGCATTCGAC 1752 0 

TCTTTTGCAA CTACAAACCG TGTCCCCACC TGTACAGCCT CTGCACCTAG CATAAAGCCA 17580 

GCCGCAGCAC CTTCACCATC CGCAATTCCT CCTGCAGCAA TAACAGGAAT AGATATAGCT 17 640 

GTGGCTACCT GTCGCACCAA GGTCATGGTT GTTAATTTAC CGATATGCCC CCCAGCTTCC 17700 

ATTCCTTCTG CAATAACAGC GTCTGCACCG ATTTTTTCCA TGCGTTTAGC TAAAGCGACA 17760 

CTAGGAACAA CAGGAATAAC GATTATCCCA GCTTCATGGA AACGTTCCAT ATACTTGCTT 1782 0 

GGATTTCCTG CTCCTGTTGT GACAACTTTA ACACCTTCTT CAATAACGAG ATCCACGATG 17880 

TCTTCCACAA AGGGAGATAA GAGCATGATG TTGACCCCAA AGGGTTTATC AGTCAATGAT 1794 0 

TTGATTTTAT CAATATTGGC CTTGACAACT TCTTTCGGGG CATTTCCCCC ACCGATAATT 18000 

CCTAATCCTC CAGCCTTGGA AACAGCCCCT GCCAAATCAC CATCAGCAAC CCAGGCCATC 18060 

CCTCCTTGGA AAATAGGATA ATCAATCTTC AATAATTCTG TAATACGCGT TTTCATAGTG 1812 0 

CCTCCAACCT TCCTTGCTTA CGTAATAGTT CGATTTCACC ATAATTTGAC AGTCAAACTA 18180 

TTACCTAAAC AAGAGGGAGT GGGTTTCTCC CTACTCCTTC TACTAATATT CTGCTTATTT 18240 
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TGCTTGCTCT TCAACGTAAG CAACCAAGTC ACCAACTGTT TTCAAGTCAT TTTCTGCTTC 183 00 

GATTTGGATA TCAAAAGCAT CTTCGATTTC TGAGATTACT TGGAACAAGT CCAATGAATC 183 60 

TGCGTCCAAA TCATCAAAAG TTGATTCAAG TGTTACTTCT GATGCGTCTT TTCCAAGTTC 184 2 0 

TTCAACGATA ATTTCTTGTA CTTTTTCAAA TACTGCCATG ATAGGACTCC TTTAAAATAA 18480 

ATAGTTTTTT TATAACAATG TGTTCACCAC ATGATTACCT AAATTGTAAG AATGAGCGTG 1854 0 

CCCCAGGTCA AGCCTCCACC GAAGCCTGAT AGAAGAACAG TCTGGCTACC ATCTAAAGGG 18600 

ATGAGACCTT GTTCTACACA CTCTGAAAGT AAAATCGGGA TACTGGCTGC ACTGGTATTG 18660 

CCATATTCCA TCATATTGGC T GGAAGTTTG GCTCGGTCAA CACCAATTTT TCTAGCCATC 1872 0 

TTATCCAAAA TACGGTCATT GGCTTGATGA AGTAGCAGAT AATCCAAGTC TGTCACCTCT 187 80 

ATAGGAGATT CATCAATAGT CTGCTTGATA GACTTGGCTA CATCTCGAAT GGCAAAATCA 18840 

AAGACTGTGC GTCCATCCAT CTTCAAAAAC GAATCTGCAC TTTCTTGATC TGAAAATGGA 18900 

GAATGTAAAC CTGAATGCCC ATAAGTTAAA CACTCGCTGC GACTTCCATC GCTATTGAGA 189 60 

CTCTCAGCTA AGAAATGCTC TTGCTCGCTA GCTTCTAACA AGACACCACC AGCACCATCT 1902 0 

CCAAACAACA CAGCTGTTGA TCGATCCGAC CAATCGACTG CCTTAGAGAG GGTTTCACTA 19080 

CCAATCACCA AGCCTTTTTG AAAGCGACCA GAAGCGATAA ACTTTTCAGC AGTTGAAAGA 19140 

GCAAATACAA ATCCACTGCA AGCCGCGGTT AAGTCAAAAG CAAAGGCTTT ATTAGCACCA 19200 

ATATTAGCTT GAACACGAGC AGCTGTAGAG GGCATCATCG AATCTGGAGT AATGGTAGCT 19260 

AGGATGATAA AATCCAGTTC TTCTCCTGTT ATTCCAGCTT TTGCCATCAG TTTCTTAGCA 19320 

ACCTCTGTAG CCAAATCACT GGTAGATTCT GTTCTTGAAA TATGCCTTTG TCGTATTCCC 19380 

GTTCGACTTG AAATCCACTC ATCATTGGTA TCCATAATCT GAGCCAAGTC GTGATTTGTA 194 4 0 

ACCACTTGCT CTGGCACATA ATGAGCAACC TGACTTATTT TTGCAAAAGC CATTATTTCA 19500 

AATCCTCCAA AAATTGGTAA AGATTAGTCA AACCTTTACC CATGACAGCA ATTTCTTCCT 195 60 

CGCTCATGCC ATCAATAATT TTTTCTACCA TGGCCTTGTG GAAGCGTTTA TGCAGTCTAT 19620 

GAATCAAGCG ACCCTTCTTT GTCAAATGCA GATGCACCAC ACGACGATCC TGTTCTGACC 19680 

GAACTCGCTC AATGTAGCCC GG 197 02 
(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 6211 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY : linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

GAAAATTTCC TCTCTTCTCT TGAAAAATTT TGAAAAAATG GTATGATAGT AACAAGTTAT 60 

TTTTAAGAGG AAAGAAAGGG GAATAATGGA GAAAATCAGT TTAGAATCTC CTAAGACGGG 12 0 

GTCGGACCTA GTTTTGGAAA CACTTCGTGA TTTAGGAGTT GATACCATCT TTGGTTATCC 180 

TGGTGGTGCG GTTTTGCCTT TTTATGATGC GATATATAAT TTTAAAGGCA TTCGCCACAT 240 

TCTAGGGCGC CATGAGCAAG GTTGTTTGCA TGAAGCTGAA GGTTATGC'CA AATCAACTGG 3 00 

AAAGTTGGGT GTTGCCGTCG TCACTAGTGG ACCAGGAGCA ACAAATGCCA TTACAGGGAT 3 60 

TGCGGATGCC ATGAGCGATA GCGTTCCCCT TTTGGTCTTT ACAGGTCAGG TGGCGCGAGC 42 0 

AGGGATTGGG AAGGATGCCT TTCAGGAGGC AGACATCGTG GGAATTACCA TGCCAATCAC 480 

TAAGTACAAT TACCAAGTTC GTGAGACAGC TGATATTCCG CGTATCATTA CGGAAGCTGT 54 0 

CCATATCGCA ACTACAGGCC GTCCAGGGCC AGTTGTAATT GACCTACCAA AAGACATATC 600 

TGCTTTAGAA ACAGACTTCA TTTATTCACC AGAAGTGAAT TTACCAAGTT ATCAGCCGAC 6 60 

TCTTGAGCCG AATGATATGC AAATCAAGAA AATCTTGAAG CAATTGTCCA AGGCTAAAAA 72 0 

GCCAGTCTTG TTAGCTGGTG GTGGAATTAG TTATGCTGAG GCTGCTACGG AACTAAATGA 780 

ATTTGCAGAA CGCTATCAAA TTCCAGTGGT AACCAGTCTT TTGGGACAAG GAACGATTGC 840 

AACGAGTCAC CCACTCTTTC TTGGAATGGG AGGCATGCAC GGGTCATTCG CAGCAAATAT 900 

TGCTATGACG GAAGCGGACT TTATGATTAG TATTGGTTCT CGTTTCGATG ACCGTTTGAC 960 

GGGGAATCCT AAGACTTTCG CTAAGAATGC TAAGGTTGCC CACATTGATA TTGACCCAGC 1020 

TGAGATTGGC AAGATTATCA GTGCAGACAT TCCTGTAGTT GGAGATGCTA AGAAGGCCTT 1080 

GCAAATGTTG CTAGCAGAAC CAACAGTTCA CAACAACACT GAAAAGTGGA TTGAGAAAGT 114 0 

CACTAAAGAC AAGAATCGTG TTCGTTCTTA TGATAAGAAA GAGCGTGTGG TTCAACCGCA 12 00 

AGCAGTTATT GAACGAATTG GTGAATTGAC GAATGGAGAT GCCATTGTGG TAACAGACCT 12 60 

TGGTCAACAC CAAATGTGGA CAGCTCAGTA TTATCCCTAC CAAAATGAAC GTCAGTTAGT 132 0 

GACTTCAGGT GGTTTGGGAA CAATGGGCTT TGGAATTCCA GCAGCAATCG GTGCTAAAAT 1380 

TGCTAACCCA GATAAGGAAG TAGTCTTGTT TGTTGGGGAT GGTGGTTTCC AAATGACCAA 1440 

CCAGGAGTTG GCTATTTTGA ATATTTACAA GGTGCCAATC AAGGTGGTTA TGCTGAACAA 1500 

TCATTCACTT GGAATGGTTC GCCAGTGGCA GGAATCCTTC TATGAAGGCA GAACATCAGA 1560 

GTCGGTCTTT GATACCCTTC CTGATTTCCA ATTGATGGCG CAGGCTTATG GTATTAAAAA 1620 

CTATAAGTTT GACAATCCTG AGACCTTGGC TCAAGACCTT GAAGTCATCA CTGAGGATGT 1680 
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TCCTATGCTA ATTGAGGTAG ATATTTCTCG TAAGGAACAG GTGTTACCAA TGGTACCGGC 1740 

TGGTAAGAGT AATCATGAGA TGTTGGGGGT GCAGTTCCAT GCGTAGAATG TTAACAGCAA 1800 

AACTACAAAA TCGTTCAGGA GTCCTCAATC GCTTTACAGG TGTCCTATCT CGTCGTCAGG 186 0 

TTAATATTGA AAGCATCTCT GTTGGAGCAA CAGAAGATCC GAATGTATCG CGTATCACTA 1920 

TTATTATTGA TGTTGCTTCT CATGATGAAG TGGAGCAAAT CATCAAACAG CTCAATCGTC 1980 

AGATTGATGT GATTCGCATT CGAGATATTA CAGACAAGCC TCATTTGGAG CGCGAGGTGA 2040 

TTTTGGTTAA GATGTCAGCG CCAGCTGAGA AGAGAGCTGA GATTTTAGCG ATTATTCAAC 2100 

CTTTCCGTGC AACAGTAGTA GACGTAGCGC CAAGCTCGAT TACCATTCAG ATGACGGGAA 2160 

ATGCAGAAAA GAGCGAAGCC CTATTGCGAG TCATTCGCCC ATACGGTATT CGCAATATTG 222 0 

CTCGAACGGG TGCAACTGGA TTTACCCGCG ATTAAAAATC CAACTTAAAT TTATTAAACC 2280 

AGCCTAAAAG GCAATAAATA ATAGAAAAGA GAGAAAAGCT ATGACAGTTC AAATGGAATA 2340 

TGAAAAAGAT GTTAAAGTAG CAGCACTTGA CGGTAAAAAA ATCGCCGTTA TCGGTTATGG 2 400 

TTCACAAGGG CATGCGCATG CTCAAAACTT GCGTGATTCA GGTCGTGACG TTATTATCGG 2 4 60 

TGTACGTCCA GGTAAATCTT TTGATAAAGC AAAAGAAGAT GGATTTGATA CTTACACAGT 2 520 

AGCAGAAGCT ACTAAGTTGG CTGATGTTAT CATGATCTTG GCGCCAGACG AAATTCAACA 2580 

AGAATTGTAC GAAGCAGAAA TCGCTCCAAA CTTGGAAGCT GGAAACGCAG TTGGATTTGC 2 640 

CCATGGTTTC AACATCCACT TTGAATTTAT CAAAGTTCCT GCGGATGTAG ATGTCTTCAT 2700 

GTGTGCTCCT AAAGGACCAG GACACTTGGT ACGTCGTACT TACGAAGAAG GATTTGGTGT 27 60 

TCCAGCTCTT TATGCAGTAT ACCAAGATGC AACAGGAAAT GCTAAAAACA TTGCTATGGA 2820 

CTGGTGTAAA GGTGTTGGAG CGGCTCGTGT AGGTCTTCTT GAAACAACTT ACAAAGAAGA 2 880 

AACTGAAGAA GATTTGTTTG GTGAACAAGC TGTACTTTGT GGTGGTTTGA CTGCCCTTAT 2 940 

CGAAGCAGGT TTCGAAGTCT TGACAGAAGC AGGTTACGCT CCAGAATTGG CTTACTTTGA 3 000 

AGTTCTTCAC GAAATGAAAT TGATCGTTGA CTTGATCTAC GAAGGTGGAT TCAAGAAAAT 3 060 

GCGTCAATCT ATTTCAAACA CTGCTGAATA CGGTGACTAT GTATCAGGTC CACGTGTAAT 3120 

CACTGAACAA GTTAAAGAAA ATATGAAGGC TGTCTTGGCA GACATCCAAA ATGGTAAATT 3180 

TGCAAATGAC TTTGTAAATG ACTATAAAGC TGGACGTCCA AAATTGACTG CTTACCGTGA 3240 

ACAAGCAGCT AACCTTGAAA TTGAAAAAGT TGGTGCAGAA TTGCGTAAAG CAATGCCATT 3 300 

CGTTGGTAAA AACGACGATG ATGCATTCAA AATCTATAAC TAATTAGAAA TATATAGCGC 3 3 60 

TGGAGATGAT TTTATGAAAA AGATTATGAG AAAAATTGCA TCGTTATTAT TGGTTCTAGT 3420 
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TGTATAATGT AATTACACCG TCGGTAATAG TGCTAGCAGA CCAAAATAAA GCAGATTGGT 34 80 

CGTATGATGA AAATGCTGTA ATTAACATTT ATGATGATGC TAATTTTGAA GATGGTAGGT 3 540 

TGCATATGAA CTTTGAACAA TTCTTCAAAT TGGCACAAAT AGCTAGAGAA GAAGGTCTTG 3 600 

AAATTCATTC TCCGTTTGAG AGAGCTGGTG CGACTAAATC TGCTCGTTAT ATAGCGAAAT 3660 

GGATTTTGAG AAATAAAAAA CATTAACAAA TATAGTTGGT AAATCATTAG GACCTAAATC 37 2 0 

AGCTGTTAGA TTCGGAGAAG CTTTATCCTA TATTGAAGGT CCTCTTCGCA GAATAAATGA 37 8 0 

G AC GAT AG AT GGCGGTTTAT ATCAAATAGA GCAAATTATT GCATCTGGAT TGAAAGAATC 3840 

GGGTTTAAAT GACTGGACTG CGAAAACTTT AGCTTCAGCT ATTCGTGGGA TATTAGATGT 39 00 

ACTTATTTAG GGGTTGAAAT CATATGAATA TTACCAATTT GTTTTCTATC AAGACAGGAT 39 60 

GTGATGAAAC TGATAGGCAA CTGCAAAAAC TATTTTTTCA GTTGGATTTA CAATTGGGAG 402 0 

AATTGACAGA TCAACTAAGA AAATTAGATT CTAATTTTGT TCCTCGTAGT CAATTTGTAG 4080 

ACACGTTGGA TTTGAATGAT GTAGAATATA AAGAAATTTT AAACTATTTT ATCTTCCATC 4140 

GTAATGATAG TGAAGAAAGT TTGGTAGAAT GGTTATATGA TTGGATTTCC ACAAATCGTT 42 00 

ATGAACTTCC TAAAGAGTTT TCGATTCGTA TGGCTCATAA ATACCATGAA AGTGTTACTG 42 60 

AAGTTTTCGG AGATGAATAA CTAAAAAACA GTCATTAGTG ACTGTTTTTT ATAGAAAAAG 432 0 

AGGTTTTATA TGTTAAGTTC AAAAGATATA ATCAAGGCTC ACAAGGTCTT GAACGGTGTG 438 0 

CTTGTCAATA CTCCACTGGA TTACGATCAT TATTTATCGG AGAAGTATGG TGCTAAGATT 4440 

TATTTGAAAA AAGAAAATGC CCAGCGTGTT CGCTCCTTTA AAATTCGTGG TGCCTATTAT 4500 

GCCATTTCCC AGCTCAGCAA GGAAGAACGT GAACGTGGGG TAGTCTGCGC TTCTGCGGGA 4560 

AATCATGCGC AGGGAGTAGC CTATACTTGT AATGAAATGA AAATTCCTGC TACTATCTTT 462 0 

ATGCCCATTA CTACGCCACA AC AAAAG AT T GGTCAGGTTC GCTTTTTTGG TGGGGATTTT 468 0 

GT AAC T ATT A AACTAGTTGG AGATACCTTT GATGCCTCAG CCAAAGCAGC TCAAGAATTT 4740 

ACAGTCTCTG AAAATCGTAC CTTTATTGAT CCTTTTGATG ATGCTCATGT TCAAGCAGGT 4800 

CAAGGAACAG TTGCTTATGA GATTTTAGAA GAAGCTCGAA AAGAATCGAT TGATTTTGAT 4860 

GCTGTCTTGG TTCCTGTTGG TGGTGGCGGT CTCATTGCCG GGGTTTCTAC CTATATCAAG 492 0 

GAAACAAGTC CAGAGATTGA GGTTATCGGA GTAGAGGCGA ATGGAGCGCG TTCCATGAAA 4980 

GCTGCCTTTG AGGCTGGAGG TCCAGTAAAA CTCAAGGAAA TTGATAAATT TGCTGATGGG 5040 

ATTGCTGTGC AAAAGGTAGG TCAGTTGACC TATGAAGCAA CTCGTCAACA TATTAAAACT 5100 

TTGGTAGGTG TCGATGAGGG ATTGATTTCT GAAACCTTGA TTGACCTTTA CTCTAAGCAA 5160 

GGGATAGTCG CAGAACCTGC TGGAGCGGCT AGTATCGCCT CTTTAGAGGT TTTAGCTGAA 5220 
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TATATTAAGG GGAAAACCAT TTGTTGTATC ATTTCTGGAC GAAATAATGA TATCAACCGT 52 80 

ATGCCAGAAA TGGAAGAGCG TGCCTTGATT TATGATGGTA TCAAACATTA CTTTGTGGTC 53 4 0 

AATTTCCCAC AACGTCCAGG AGCTTTGCGT GAGTTTGTAA ATGATATCCT GGGGCCAAAT 5400 

GATGATATCA CACGTTTTGA GTATATCAAA CGAGCTAGCA AGGGAACAGG CCCAGTATTA 54 60 

ATTGGGATCG CTTTAGCAGA TAAGCATGAT TATGCAGGTT TGATTCGTAG AATGGAAGGT 552 0 

TTTGATCCAG CTTATATTAA CTTAAATGGT AATGAAACGC TTTATAATAT GCTTGTCTGA 5580 

GGACTAATAA AAAAATATCA TACCTTCATT TTGATTTCCT ATCTATTGAC AAGCATAGTC 564 0 

ACACTGTCTT TAATACTCTT CUAAAATCTC TTCAAACCAC GTTAGCTCTA TCTGCAACCT 5700 

CAAAACAGTG TTTTGAGCAA CTTGCGGCTA GCTTCCTAGT TTGCTCTTTG ATTTTCATTG 57 60 

AGTATAAGGT ATGATTTGAT TTCTTTTTGT TGACAAATAT ACTATATTAA AAAGATATAT 582 0 

AAGTAATTAA CTGAGCTTAT CTGTCTTGTC ATCTCTATTA AGGATGGTTT AGATAATCGG 5880 

GTGTCTGCTT CTAGGCTAGC ACCTCAATAT CCAAAGGAGT GATGAATTTG AAGGACATAA 594 0 

GGAATACCTA TCTCTCAGAT GATTTATTGA GGAAGAAAGA TAGGAGTTTT TGAGCTAGTG 6000 

AAGGCTTGGA TTTCTAAAGG TTAGAACTAT CATCTTCAGT TCTTAAATCG AAGAAATAAG 6060 

CTATCTTACG GAAATAGAGA AGCATTTTTT AAGAACTTGA ATAATTTCGC ACCTTAAGAG 612 0 

GGTAATAATA CAGTATTTTT ATTAGCAAAT ATTTATGGTG TAGAGGCTAG CAAAACCTAT 6180 

ATATTATCGG ATTTAAAAAG GAAGTAAGAA A 6211 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 7939 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

CCGGACTCCC CACGATTCTT CAAAATAACT GAGTATATTT CTATCTTGAT TTTCAGATAT 60 

AAATTCTTCC TTCTGTGGCC TCTTCTTACG CTTGAGAAGA GCTTCTCCGA CATGGCTTCT 120 

TCCTTACTGA GCAAAACCTT GAGCATAGAT AAGTTTGACT GGCAAGCGTG CTCTTGTATA 180 

TTTGGCTCCC TTCCCACTAT TGTGGATAGC GAGGCGTCTT CTCATATCAG TCGTATAGCC 240 

TATATAGTAG GATCCATCAC GACACTCCAG AACGTACATA TAAGCCTTAT GATCCATAAT 300 

AAATCTCTTC GATTTCGGGC GTATAAGAGC CATCATCATT GTGGACAATC AAAGGAGGTA 360 
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AGACCTTAAA GCCACTTGTT GAGCCATCCT TGATCGCCTC AATCAAAAGC ATATTGGCTT 420 
CCTTTTCTCT TTTTGGATAA ACAAACTGCA GGCGCTTAGG GGCTAGATTA TGTCGTTTTA 480 
ACGTATCCAA AATATCCAGA AGTCGATCAG GACGATGAAC CATGGCCAAA CGCCCATTAG 540 
ACTTGAGAAT ACTCTGGGCA CTACGACAGA TTTCTTCCAA ATTAGTCGTG ATTTCGTGTC 600 
GAGCCAAGAG ATAATGTTCA CTCTCGTTCA GATTAGAATA AGGATTCACC TTGAAATAGG 6 60 

GTGGATTACA CAAAATCATA TCCACCTTAC TCCCCTGAAT GTGAGCAGGC ATATTTTTCA 720 

AATCATCGCA GATGACCTGC ATTTGCTCCT CTAATCCATT CAAACGGACA GAGCGTTCAG 7 80 

CCATATCCGC CAAACGCTCC TGAATCTCAA CAGACAATAT CTGTGCTTGA GTACGAGTGC 84 0 

TAGCAAAAAG CCCCACTGCT CCATTCCCAG CACAGAAATC CACAATCAAC CCCTTCTTAG 900 

GAAAACGTGG AAATCGTGAT AAGAGAACAC TATCCACCGA ATAGCTAAAA ACCTCTCTAT 9 60 

TTTGAATGAT TTTGATATCT GTCGAAAAGA GCTGGTTAAT GCGCTCTCCT GATTTTAATA 102 0 

ATTGTTCTTC TTCCATGGTC CTATTATAGC AAATTCATAT TAACATTACA AAAAATATAA 1080 

AACTCTAAAC TACTTCTTCT TTTTTAAATG GTGCAGGGCT TCTCCAGTCC AGATTGGTAG 114 0 

CATTCGTCGA AAGGGAGCAA AGCCGTAGTT AAAGCGGTCG CTTGAAAAGC GTCTCCGTCT 12 00 

AGGAAACTGG TACTTTTCTT CCTCCAAAGT GCGGATAGAA AGACTGGCTT TCCCTGTAAA 12 60 

TTCATCTAAA TCCACTACCT GAACTTGAAC CTCTTCATCG ACTTTCAAGG TTTCATGAAT 1320 

ATTTTCAATA AATCCTGTCC GAATCTCTGA AATGTGAATC AGCCCCGTAT CACCCGTCTC 1380 

TAACTCAACA AAGGCACCGT AGGGCTGAAT CCCTGTAATA CGCCCCTTTA GCTTATCACC 144 0 

GATTTTCATC TTAGTCCTCG ATTTCAATAG TTTCAATTAC AACATCTTCA ACTGGCTTGT 1500 

CCATAGCTCC TGTCTCAACA GCAGCAATGG CATC C AAG AC AGCGTAAGAT GCTTCATCAG 1560 

CTAACTGACC AAAAACCGTG TGACGGCGGT CTAGGTGAGG TGTCCCACCT TGATTGGCAT 162 0 

AGATTTCTGC AATCGGTTCT GGCCAACCAC CACGAGTAAT TTCTTTCTTA GAATAAGGTA 168 0 

GGTGTTGGTT TTGCACGATA AAGAACTGGC TGCCGTTGGT ATTTGGACCA GCATTTGCC'A 1740 

TGGAAAGAGC ACCACGGATA TTGTAAAGCT CTTCTGAGAA TTCATCCTCA AAAGATTCGC 1800 

CGTAGATTGA CTCGCCACCC ATACCAGTTC CAGTTGGGTC TCCACCTTGG ATCATAAAGT 1860 

CCTTGATAAT ACGGTGGAAA ATGACACCAT CATAGTAGCC ATCTTTTGAA AGAGATACAA 192 0 

AGTTAGCCAC TGTTTTAGGA GCATGTTCAG GGAAAAGCTT GATACGTAAG TCTCCGTGAT 1980 

TGGTCTTAAT AGTCGCAAGA GGACCTTCTA CTGTTTCAAT GTCTACTTGT GGAAAATGCA 2040 

ATTCTTTTTC TACCATACCA AATACTTCTA AGGCAGCAAA AATGCCATCT TCTTCTAATG 2100 

TTTTTGTAAT ATAATCTGCT TTTTCTTTGA TTTTATCATG AGAAATTCCC ATGGCAACGC 2160 
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TGATTCCAGC ATAATCAAAG AGTTCCAAGT CGTTGAGACC ATCTCCAAAA ACCATGACCT 2 220 

TCTCTGGTTT CAAGCCAAGG TGTTCCACAA CCTTTTCCAC CCCCGTCGCT TTGGAGCCTG 2280 

AAATCGGCAC AATATCAGAC GAATGTTGAT GCCAACGAAC CATGCGAAGT TTGTCTGAGA 2340 

GACTGTCAGG CAAGTGCAAG TCATCTCCCT TATCTTCAAA AGTCCACATC TGATAGATAT 2400 

CTTCTTTTTC ATGGAAATCG GGATCTACAT CTAAGTCGGG ATAAATTGGA TTGATAGCTT 24 60 

CACTCATCAT ATCGGTGCGA GTCGACAACT TGGCATCATG ACTCCCAACC AAGCCATACT 2 520 

CAATTCCTTC TTGCTTAGCC CAAGAGATAT ACTCCTCAAC ATCTGACTTT TCAATCTGAT 2580 

GCTGATAAAT GACCTGACCT TTTTTATCTT CGATATAAGC CCCATTCAAA GTTACAAAAA 2 640 

AGTCAGGCTT GAGATCACGA ATCTCTGGAA CAACACCAAA AATGCCACGT CCAGAGGCGA 27 00 

TTCCTGTTAA AATTCCTTTT TCACGCAACT GTTTAAAAAC AGTGGGAATT GTAGTTGGAA 27 60 

TAAACCCTGT CTTTGAATTC CGCAATGTAT CATCAATATC AAAAAAGACA ATCTTGATCT 2 820 

TCTTTGCCTT GTATCTTAAT TTCGCGTCCA TCTCACTACC TCTTTCAATC TAACTCTTTC 2 880 

CATTATATCA TAAAGTAGGC AAATCCCCTA TTTTCAAAAA GTTTATCATT TTTATTTTAA 2 940 

TTTCTTGGAT GAGAAAAGAG ACATATTTAT GAAAAAGCTC CATCGTGCTT TTAATGTGTT 3 000 

CTCTTGTTTT CAAACTCGTA AAAAGGGAGC CACTGATCCT AACTCGCTCT CTCATTTCAA 3 060 

AGCTTGTGAA AAAAGACCCG TTGGGGTCTT AATTCGCTTT CTTGTTTTCA AGCTCATGAA 3120 

AAAGAGACCC AACTGGGTCT TTTCTTTAAT CTTCGTTTAC GAAAGGCATC AAAGCCATTA 3180 

CGCGAGCGCG TTTGATAGCT GTTGTTACTT TACGTTGGTT TTTAGCTGAA GTTCCTGTTA 3 2 40 

CACGACGAGG AAGGATTTTC CCACGTTCTG AAACGAAACG GCTAAGAAGC TCAGTATCTT 3300 

TGTAATCAAC ATATTCAATT TTGTTTGCTG CGATGTAATC AACTTTTTTA CGGCGTTTGA 3 3 50 

ATCCGCCACG ACGTTGTTGA GCCATGTTTT TTCTCCTTTA TAAGTTTAGT TGTCCATTAG 3420 

AATGGTAAAT CATCATCTGA AATATCCAAT GGGTTTGTTG CTCCAAATGG ATTTTCATTA 34 80 

CGTGAAAAGT CTGGTACTGA ATTTGTAGGT GCTGAATAGT TTGCAGTTGG TGCAGAGTAA 3 540 

GCTCCACCTG TGTGACCCTC ACGCACACTA CGGCTTTCCA ACATTTGGAA ATTCTCAGCC 3 600 

ACGACCTCTG TCACGTAGAC ACGTTGTCCT TGCTGGTTAT CGTAACTACG AGTCTGGATA 3 650 

CGACCTGTCA CCCCGATAAG TGAGCCTTTT TTAGCCCAGT TAGCAAGATT TTCAGCCTGT 372 0 

TGGCGCCACA TAACGACATT GATAAAATCA GCCTCACGTT CACCATTTTG ACTCTTAAAT 37 80 

GTACGGTTTA CTGCAAGAGT AAAAGTCGCA ACTGCTACAT TTGATGGGGT ATAACGCAAC 3 84 0 

TCAGCGTCAC GTGTCATACG CCCTACAAGT ACAACATTGT TAATCATAGT TTACCTTCTT 3 900 
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ACGCGTCAAT TTTGACGATC ATGTGACGAA GAATGTCAGC GTTGATTTTT GAAAGACGGT 39 6 0 

CAAACTCTTT AAGAGCTGCA TCGTCATTTG CTTCAACGTT AACGATGTGG TAAAGTCCTT 4020 

CACGGAAATC TTGGATTTCG TATGCAAGAC GACGTTTTTC CCAAGTTTTT GATTCAACAA 408 0 

CAGTTGCACC GTTGTCAGTC AAAATAGAGT CAAAACGTGC TACCAAAGCG TTTTTAGCTT 414 0 

CTTCTTCAAT GTTTGGACGA ATGATATAAA GAATTTCGTA TTTAGCCATT GATATGTTCC 4200 

TCCTTTTGGT CTAATGACCC CAAGACTTTG CAAGGGGTAA GTGAGGTTCG CTCACAATAA 42 6 0 

ACTATTATAC TAGAAAAAAT TTTTTTACGC AAGTAAAAAC ACTAGAATTC GAAAAAACGC 4320 

CACATGGGCG TTTTCCTGTT CTTATGGTTT GATACGGTGC AACATACGTG GGAATGGAAT 4380 

AGCTTCACGG ATATGTTTTG TTCCTGCTGC GAAGGTTACC ATACGTTCGA TACCGATACC 444 0 

AAATCCTCCG TGTGGAACTG TACCGTATTT ACGAAGGTCA AGGTAGAATT CATATTCTGT 4 500 

ACGATCCATG CCAAGTTCAT CCATCTTAGC GACAAGGGCA TCGTAATCTT CCTCACGCAT 4560 

AGACCCACCG ATAATTTCTC CATAGCCTTC TGGAGCAAGC AAGTCTGCAC AAAGCACGCG 4 62 0 

CTCTGGATTT CCAGGAACTG GTTTCATGTA GAAGGCCTTG ATGGCTGCTG GATAGTTCAT 4 680 

GACAAATGTT GGCACACCAA AGTGGTTTGA AATCCAAGTT TCGTGTGGTG ACCCAAAGTC 4 740 

ATCACCATGC TCAAGATGCT CGTAGTCAGC ATCTTCATCA TTTTCATGCT CTTGCAAGAG 4 800 

GTCAATGGCT TGATCGTAAG TGATACGTTT GAATGGCTCT GCAATGTAGC GTTTCAAGAG 4 3 60 

TTCTGTATCA CGTTCCAAGG TTTCCAAGGC TTGAGGCGCG CGGTCAAGAA CACCTTGTAG 4 920 

AAGAGCTTTC ACATAAGCTT CTTGCAAGTC AAGCGACTCA TCATGTGTCA AGTATGAGTA 4980 

CTCAGCATCC ATCATCCAGA ACTCAGTCAA GTGACGGCGT GTTTTTGATT TTTCAGCACG 5040 

GAAAACTGGA CCAAAGTCAA AGACACGACC AAGAGCCATA GCCCCTGCTT CTAGGTAAAG 5100 

CTGACCTGAT TGGCTCAAGT AGGCTGGCGT TCCGAAGTAG TCAGTTTCAA AGAGTTCTGT 5160 

AGAATCTTCT GCCGCATTTC CTGAAAGAAT TGGGCTGTCA AACTTCATAA AACCGTTCTT 52 20 

GTCAAAGAAC TCATAAGTTG CATAGATAAT AGCGTTACGG ATTTGCAACA CAGCTACTTG 52 80 

CTTACGAGAG CGTAgCCACA AGTGACGGTT ATCCATCAAA AAGTCTGTTC CGTGTTCTTT 5340 

TGGTGTGATT GGGTAGTCTT GAGATTCACC GATCACTTCG ATGTCTGTGA TGTCCAACTC 5400 

ATAGCCAAAT TTAGAACGTT CGTCCTCTTT GACAATACCT GTCACATAAA CAGACGTTTC 54 60 

TTGGCTCAAG CGTTTGATAA CATCAAACTT CTCAAGTCCC ACTTCTTCAC CAAATTTTTC 5520 

GACAAAGTTT GGTTTAAAAG CCACACCTTG AAAGAAGGCT GTTCCATCAC GCAATTGTAA 55 80 

GAAAGCGATT TTTCCTTTTC CTGATTTGTT GGCAACCCAA GCGCCAATCG TCACTTCCTG 5640 

ACCAACATAG TCTTTTACGT CAATAATCGT TACACGTTTT GTCATTATTT TTCCTTTTCT 57 00 
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TTTTTATTCT TTATGGCAAA CCACCTCTAT ATTGTTCCCA TCCAGGTCAA TCATAAAAGC 57 60 

AGCATAGTAA ATCGGATGCT CACTTCGATA ACCAGGAGCC CCATTGTCTC GCCCACCTGC 5 82 0 

CTCTAAGCCA GCCTCATAAC AAGCCTGAAC TTCTTCCTTA TTTTCTGCTA AAAAAGCAAA 5880 

ATGAACAGGA TCTTGTGTTC CCTGAGTCAG CCAAAAATCA CCACCAGGAT GAGGGCTGTT 594 0 

CGGGGATAGA AAACTAATTA GAGAACTAGT CTTAAAAGCC AATTTATAGT CCAAAGGAGC 6000 

GAGAAAACTC CTATAAAATC CTTATGAAAT TTGTAAATCC TTTACCTTAA TCTCAAAATG 60 60 

ATCAATCATT CTCACTACCC ATAAATGCTT TCAAGCGTTC GACTGCTTCT TTAAGCGTGT 612 0 

CTAGGTCTGT CGCATAGCTG AGGCGGACAT TTTCTGGTGC TCCAAATCCA GCTCCTGTTA 6180 

CCAAGGCCAC TTCGGCTTCT TCTAAGATAA CAGTTGTAAA GTCTGTCACA TCCGTGTAGC 62 40 

CTTTCATCTC CATGGCCTTT TTGACATTTG GGAAGAGATA GAAGGCCCCT TGCGGTTTGA 6300 

CCACTTCAAA TCCTGGTACC TCTGCAAGGA GGGGATAGAT GGTATTAAGA CGTTCCTCAA 63 60 

AGGCCTGACG CATGCTTTCT ACAGTATCTT GCTCACCTGA TAGAGCCTCA ACTGCTGCAT 642 0 

ATTGGGCTAC TGCTGACGGA TTCGAAGTTG TTTGACCTGC AATCTTGGAC ATGGCAGCGA 64 80 

TAATGTCTGC TTCTCCAACG GCATAACCAA TCCGCCAACC AGTCATGGCA TAAGTTTTAG 654 0 

ACACACCATT GATGACCACT GTTTGCTTGC GAATCGCTTC CGATAGGCTA GAAATCGGTG 6600 

TGAACTCATG ACCATTATAA ACCAAGCGGC CATAGATATC GTCTGCTAGG ATGAGAATAT 6660 

CATTTTCTAC AGCCCAGTTT CCAATTGCCA AGAGTTCCTC ACGGGTGTAA ATCATACCTG 67 2 0 

TGGGATTAGA TGGCGAATTC AGCACCAAAA CCTTGGTCTT GTCAGTGCGA GCTGCTTCTA 6780 

ACTGCTCTAC GGTCACCTTA AAGTGATTGT CTTCCTTAGC AGAAACAAAG ACGGGAACGC 684 0 

CTTCTGCCAT CTTGACCTGA TCTCCATAGC TAACCCAGTA TGGGGTTGGG ATGATGACTT 69 00 

CATCACCTGG ATTGACCACA GCCATAAAGA AGGTATAGAG AGAATATTTG GCTCCCGCAG 6960 

CGACTGTCAC TTGATTTGAC GCTACAGAAT AGCCGTAAAA GCGCTCAAAG TAGCTATTGA 702 0 

CCGCCGCCTT AAGCTCTGGC AGACCTGAGG TTACTGTATA AAAAGAAGCA CGCCCATCTC 7080 

GAATCGATGC AATGGCGGCA TCTTGGATAT TTTTGGGAGT AGTGAAATCT GGCTCACCCA 7140 

AGGTTAGAGA CAAAATATCT CTACCCTCAG CCTTCAGTGC TTTGGCACGG GCTCCAGCAG 7200 

CCAAAGTCAC ACTTTCTTCC ATTTCTAAAA CACGGTTGGA TAGTTTCATA GGCCCTCCTT 7260 

GTTGACCAAT GCTCCTGTTT CAAAATCTAC TAGATAAAAA TCAGATCCTG ACTTAACTTC 732 0 

CCAGATTGGC TTATCTTGAT AACGGCCAAA GGTTATCTTG TCAATCTCGC CAGCTCCCTT 73 80 

TTCCTTAGAA ACCGTTTCTG CTTTTTCTTG TGAAACACCC TGATTTAGCT GATAAACGTA 7440 
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AATCTTATGG TCATCTTTAC CAATCAGGAC AGCAAGCGCT TCTTGCTGTT TGTTACGACC 7 500 

AAGAACGCTG TAATAAGATT CCAAGCCATT GTATAAATCA ACCTGATCAG CCTGCTCTAA 7 560 

TCCTGCATAC TGCTGAGCTA ATTTTTCTCC TTCACTTTTA GCTGTTTGAT AGGGTTTCAT 7 62 0 

GCTAAGAGAA ACCATATACA GAAAGGAACC ACTGATAACC ACAAACAAAA TCGTCATCCC 7 680 

TAGACCATAC TGCCACAGTA GATTATTTTT TGCTTTGTTT TGTCTTTTTT TCACTCGTCT 7740 

ATTTTACCAT CTATTAAGCT TTATTACAAG TGAATATAAG AATACTCTTC GAAAATCTCT 7 800 

TCAAACCACG TCAGCTTTAT CTGCAGACCT CAAAGCTGTG CTTTGAGCAA C C AATT C T AT 7 8 60 

TTCTCCCTTC AAACAAAACC GATTTTGAAA GTGAAACAGT TCTTACTTTT TCAGTCACAA 7 92 0 

ATGATTAGAG TTTGCCGGG 793 9 
(2) INFORMATION FOR SEQ ID NO: 10: 
( i ) SEQUENCE CHARACTERI STICS : 

(C) STRANDEDNESS : double 

(D) TOPOLOGY : linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
CCGCTCTACC GTCAAATAAT TACCATTTTG TTTAATACCG AAATTTTTAT CTACTGAAAA 60 

TTCAGTTGGT CTGTTGGTAC GATCGTCGTA TACAGTACCA TTCTCACGAA TAGTATAATT 120 

GTAATCAGTA TCACCTTGTT TCCTTAATTT AAGGTAATAA TTACCATCAA TTTGTTTATA 18 0 

ACCTGAATCT TTTCTAGTTG CTTCTCTAAA ACTTACTCCA GCAGGCATCA CATCAGCAAA 240 

CATGAGTACT TGTTTGTTCT TTTTTTCAAC AATAACAGAC TCAATATAGG TTGCACCACC 300 

GCTGATTTGT AAGTCACGTC CACCAACTTC ACGAGGCCAT TCTAATGGTA CTGGCGCAAA 3 60 

ATCATCGAAT GCCAATGTTA ATTTTGGTTT AGTCCATGTC TTACCATTAT CATCACTATA 420 

ACTTGTAGCA ATATTAATTT TATTCAAGAA ATCATGAGTT CCACCGTAAC GAGCGTCAAT 480 

GCTTGAAAAT ACCCGACCAT TGCTAAAAGT ATACAGAACT GGAATACGGA AATAGTTAGA 540 

ACCTGTTGTA TCATTAGCCG TATAAATTAA ATGTCCAGTA ACAGCGTTTG TTGTCATCTT 600 

TTTAACAGTT TCTTCATCCA ATGCACTATT AAAGAATTTG ATATTTTCTA GTGTTCCGTT 6 60 

AAAACCAAAC GCCGTTTTTC CTGCACGTTT CACTCCCCCA AGCATATAGT AATCAATACC 72 0 

TTTAATATCC TTGATGTTTA GGAAATTATC CACTTTCTTT TCTACTACTT TTGTACCATT 7 80 

TGCGTATAAA GAATATGTTT TTTTGACTGA ATCTGCTACT ACTGCAACAG TGTTAGTCAC 84 0 

AGCCTCTTGT TTGTACTTAC CCCAAACTGA AGCAGGTCTG GATACTAGGT TATTTTTATT 900 
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GGAAGAAGTA TCACGCGCTT CCATCCCCAA CTCACCATTG TCTCTAAGGA ACACATCTAC 96 0 

ATAACTATTT TGTTGACCGG CTTTGGAATT AGATATTCCA AACAGAGCTT GTAAGCCTTT 1020 

CTCACTTGAC TGATTGTACT TAATCACTAC AGTAAAGTCA CCGCTAGTAA ATTTATCCTT 1080 

TAACTCTTTA GTAACATTTT CTCCGCCCCC TGTTAAAGTA ACATTATTTT TTTCTAAGAC 1140 

AGGAGTTTCT TCCGCTGTAG AAGATGGATC CTTAACAGTA GTTTCAACTG TTCGAGGTTG 1200 

TACAGTAACT TCCGAAGAGT TATCCGATGT AGGTTGTACT TCCGAAATCG GAGTCGTTGG 12 60 

TGCAACAGGT TGCACCAACT TTGGTGTTGA TACTTCAGAA GTTTCAGTCT CCTGAGCTGC 1320 

AACTGAGTTA GCAACAAATG CTGATAATAC CACTACAGTA CCTAAGGTTA CATATTGTTT 13 80 

AATATTTTTT TTCATTTTAT TTTTCCTCGT TTAAAACTTT GATAACAAGT TTTTTAACAG 1440 

TTTCATCATT GCAATGAATC TTTGGTTGGT GAAGATCTTC TTCAAAAGTC ACCAACATAT 1500 

TCCCTGGAAG CAATTCAACA ATTTGATAGT CTTTGCTATC GTAAAAAGCA ATATCCTTCT 15 60 

CTTCGCTAAA AGGTACACGT GACTGGGCAC GAACTGGGGA AGTTACTGCC ATTTTTTCAG 1620 

TATTTTCAAC AACAATATGA ATATCTAAAT ATTTCTTATG AGTTTCAAAA ATATCTCCTG 1680 

GAACTCCATC AGCTAGATAA GTCATACAAT TTGCAAAAAC ATTTTCCCCG TCAATATCAA 1740 

TTTTTCCATC AACTAAATCT GTCAAATTTG TATTTTCTAA AAAATCACAG ACTTTTGAAA 1800 

AATATTTATT GACAGAAGCA TATCGTTTAA AATCAGATTG TTCAGAAATA ATCATATTAT I8 60 

TTTCTCTTTT CTATTAGTGA CGAACTTCCC AACTTGAATC CGCTTTAATT TCTGTAATAT 1920 

CATGAATCGT TGTATATTTA GGTGCAGATA CTTTATTTCC AGTAAGAACA GATACAATAT 1980 

AACCTGAAAC TACTGATACA GAGATTGAAA TCAATGAATA TGCCCAGTAG CTAACAGCTG 2 040 

TTGGAGGAAG GAAGTATTTA ATAAATACCA TGACGATGGT TGATACAATC AGCGCTGCAT 2100 

AAGCACCTTG TTTATTTGCT TTTTTAGAAA CAAATCCAAG AATAAATACA CCACCAAGTA 2150 

GACCAAGTAC AAGTCCCATG AAACTATTGA ACCATTCGTA TGCAGATTTA ATATCTGAGT 222 0 

GAGCCATGAC AATGGAAACA CCAATTGAGA ATAAACCTAC TGCTAGAGAT ACGAATTGTG 22 80 

CAATTTTCGT ACGACGATTG TCTGACATAT TTTTAGAAAT GACATCTTGA ATATCCAATG 2 340 

TCCATGAAGT TGCAACAGAG TTCAAACCTG TTGAAATAGT TGATTGAGAT GCTGCATAAA 2 4 00 

TCGCTGCCAA GATCAAACCT GTGATACCTA CTGGTAACTG GTATGCAATA AAGTACATAA 2460 

AGATTTGGTC TTGAGGGATA TTGCTAGCTG CACTATCTGC ATTTTGTACT TGATAGAATA 252 0 

CGTACAAGCC TGTACCAATC AAGTAAAAGA CTGTTGCAGT TGCAAGTGAC AAAACACCGT 25 80 

TTGTGAACAA CATCTTATTA AGTTTCTTAA TATTTTGTGT TGTAGTAAAA CGTTGAACCA 2 640 
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AATCTTGAGA TGAAGCATAG GAAGACAAGA TTGTAAAGCC TGAACCCATC ACAATTAAAA 27 00 

AGATGGAGTT TGAAAGCAAG TTAGGATCGA AAAGTTTTTC ATTTGCAGCA AGGAATTTCC 27 60 

CGTTTGCTAA TGTTTCTGCT ACTGCACCAA AGCCACCTTT AATATTAGCA ATCAGTACAA 282 0 

ATAAAGCTAA AACGACACCA CTAATCAGAA TCACACCTTG AATAAAGTCT GTCCATAATA 2880 

CGGATTTTAG ACCACCAGTA TAAGAATAAA CAATTGCAAC TACACCCATC AAAATAATCA 2940 

AAATATTGAT GTCAATTCCT GTCAATACTG ATAAACCAGC TGATGGGAGG TACATAATGA 3000 

TAGACATACG TCCCAATTGA TAAATAATAA ACAAGAGTGC TGAAATAATA CGAAGTGCTT 3060 

TAGAATTAAA ACGTTTATCC AAGTAATCAT ATGCCGTATC GATGTCTATC CGTGCAAAGA 312 0 

TAGGTAAGAT AAAACGAATT GTCAGTGGAA TAGCTACTAC CATCCCTAAT TGAGCAAACC 3180 

ATAAAATCCA GCTACCTGCA TAAGAGCTAC CAGCGAGTCC CAAGAAGGAA ATCGGACTGA 3240 

GCATTGTGGC AAAAATGGAT ACCGAAGTAA CATACCAAGG AACCGAACCA TCTCCTTTAA 3300 

AGAACTCTTT TCCTTTCATC TCTTTTTTAG AGAAATAGAT ACCTGCAACC AACACCGCAA 33 60 

GTAAATAAAC AATCAAGATA ATTAAGTCAA TTATTGTAAA TCCTGTTGTG CCCATAACAT 342 0 

ATCTCCATAT TGATTTTATT TATTATAAAA ATTCTTTTCG TGCTTGTTGA ATAAGTTCTG 3480 

CTGCTTGTTT TGCAACTTCC AAGTCACCTT CTGCCAATGC TTCTAAAGGT TGACGAACAG 3540 

AACCTAAATC AAGTTTTTCA TTTAGACGCA AAACTTCTTT TGCTACAGCA T AC AT ATT T G 3 600 

CCTTACCTGA TATCATCTTA TAGATAACTT CATTGATAGC ATATTGAAGT TTTTTAGCTG 3660 

TATCTAAATC TCGTTCTTGA ATCAAACTTT CCAATTTCAA GAACAAATCT GGCATAACGC 372 0 

CATAAGTACC ACCAATACCA GCTTCTGCTC CCATCAAGCG ACCACCAAGA TATTGTTCAT 3780 

CTGGACCATT GAATACAATG TAATCTTCTC CACCTGCAGC TACAAACATT TGAATATCTT 3840 

GTACAGGCAT AGAAGAATTT TTAACTCCAA TCACACGAGG ATTTTGACGC ATTGTTGCAT 3 900 

ACAAACTACC AGTCAACGCA ACCCCTGCCA ATTGTGGAAT ATTATAGATA ATAAAATCTG 3 960 

TATTTGACGC AGCTTCACTC ATTGCATTCC AATATGCTGC GATTGAATAC TCTGGCAATT 4020 

TGAAATAAAT AGGTGGGATA GCTGCAATAG CATCGACTCC AACACTTTCT GAATGTTTTG 4 080 

CCAATTCGAT ACTATCTTTC GTGTTATTAC ATGCAATATG GTTGATAACT GTTAATTTAC 4140 

CTTTAGCAAC TTCCATAACA GCTTCAATAA TTTGTTTACG ATCTTCTACA CTTTGGTAAA 4200 

TACATTCACC TGAAGAACCA TTTACATAGA TACCTTTTAC ACCTTTGTCA ATGAAATATT 42 60 

GTACCAGAGA TTTTACACGA TCTTGGCTAA TTTCACCATT TTCATCATAG CAAGCATAAA 4 320 

ATGCAGGGAT AACGCCTTTG TATTTAGTTA AATCTTTCAT CAGATTTCTC CTTTATATTG 4380 

TTTTTTATTT GATGACATTA ATAAATCGCT GAGCAATTTC TTTTGGACGT GTAATCGCTC 4440 
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CACCAATGAC TACACTGGTA ACACCTAAAC TATAAGCTTT TTTTAATTGT TCTGGATAAT 4500 

GAATTTTTCt TCGGCAATTA CCGGAATATT AAAATCAGCC AATTTTTTCA TTAGTTCAAA 45 60 

ATCAGGCTCA T CTG ATTGT A CACTTGTACT TGTGTAACCT GATAATGTTG TACCAACAAA 4 62 0 

ATCAACGCCT GATTTAAATG CATAGAGACC TTCATCTAAA TTACTTACAT CCGCCATCAG 4 680 

CAATTGATTC GGATATTTTT CTTTTATTTT TTTGATAAAT TCACTGACAA CTAAGCCATC 474 0 

ATATCTTGGT CTTAAAGTTG CATCAAATGC AATGACTGTT GTTCCGCATT CTACAAGTTC 4800 

ATCTACTTCT TTCATCGTAG CAGTAATATA TGGTTCTTGA GGTGGATAAT CCCTTTTGAT 48 60 

AATTCCAATT ATTGGTAAAT CTACTACTTT CTGAATTGCT TTAATATCAC GCACAGAATT 492 0 

TGCGCGAATG CCCACTGCTC CTGCCTCTAA AGCTGCTTTA GCCATAAAAG GCATCAAGCT 4980 

AAATTCTTCA TTATAAAGGG CTTCACCAGG TAAAGCTTGA CAAGAAACAA TGACTCCACC 504 0 

TTGAACTTGG CTTATAAATT TTTCTTTAGT CCAAATTTGG CTCATTTTAT TATTCCTCCT 5100 

TATGGATAAT AGTTTGATTG TAATAATATT GTCTCTCTGG ACTTTCCAGA TAATTAGAGA 5160 

ATAAGCAGTC TGTAATTAAA AGTATTGGAA ACTGAGGTGA TATGCGATTG CCATACGAGA 522 0 

GATGATCGGT CGAAGCTAAT AACAATAGTT CATCAAAGAA ACAATCTTCT TCGTCAAATT 52 8 0 

TTCTTGTAGT CATTAAAACT GTTTTAGCGC CTTTATCTGC AGCTTTTTGT AGACCTTCTA 5340 

GTACAATATC AGTTTGACCT GAAATGGATG CTCCAATGAC AAGGCAATTT TCATTAAGTA 5400 

GTAAGCTACT CCACAAAATC ATATCCTCGT CTGATAATAC TTCACCAATC ACTCCGAGAC 5460 

GCATAAATCT CATCTTCATT TCTTGTAAAG CAAGAACAGA ACTTCCTTTA CCGTAGAGAT 5520 

ATACACGCTC AGCAGTTTCT ATCATCTCAG CAATACGCTC AAGTTGAACT TCATCAAGAA 5 580 

CCCTGTAAGT TTTTCTCAAC ATTTCCTCAT AGTCGGATAA AACTTTTTCT GTTGCCTCTG 5 640 

TATATAATGC CAACTTTTCT TTCTCATGAA TCATCTCTTG GTATTTGAAA ATGAATTGTC 5700 

TAAAACCTTT AAAACCACAT TTTTTCGCAA ATCGAGTCAA TGTTGCTTTG GATACATTAA 5760 

GGTATTCGCA CAATGCTTTA GATGAATAAT CATTCAGAGG TTGCTGTTTT AAGAAGAATT 5 820 

TAGCAATGTC TTTTTCAGCA TATGCCATAT TTGGTAAGTT AGCTTCTATC ATTGGAATTA 5 880 

GTTCTTTTTG CAGTAACATA TGAGCTCCTT AGTTGAAGTA AACGTTTACA TTCTTTATTT 5 940 

TAACACTTTT TTTTTTTTTC AATATTTTTC ATAAATTAGA AACTAGTTTC CAATTTCTTT 6000 

CGTTTCATAA CAGAACAACA AACATAAAAA TATAATAGTT TTTATTCTTT TTATCGTAAT 6060 

TATATGTATT GTAAGAACGT TTATCACTAA TAATATGTTC ATATTAAAAT ATTTTAGTAA 6120 

TAT T TT AT T T TGGTTTTATT ATTTCTTTTC GGAATTTCTA TATAATATTT TATTTCTAAA 6180 
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AAAATTGAAA AAATATTTCT AGTTTCTTTA TTTTATATAG GTAATATATT TTATTTCTAA 624 0 

ATTAAAAGAG AATCCCATAA AAACTACAGA TTTATGAGAT AAATCAGGTC ACCTATTTTA 63 00 

AAAAAGCAGC AAAC TAT AAA CTAAAAAGTT CCACACCAAA TGTAACCCCA TACTTCCCCA 6 3 60 

TAAGTCAGAT TTATAGCGCA CCATACCTAA AAACATTCCA AGTGAAACGT ACAGACACCA 64 2 0 

AGCTAGAATG GTTCCTGGAT GATGTACTAA GGCAAATAAA ACACTTGTCA AAGCAACTCG 64 80 

AATATCTAAT TTTCTAACCA AGTTCCATAA AATTTCACGA TACAGAAATT CTTCAACCAT 654 0 

ACTCGCATTG ATTAAGAACA ATAAAAATGA AAACCAAGGA ACTTGATGTT GAAGGCCAAT 6600 

TAAATTTGTT TGATTCGTGC TTCCTTGAGC ATGAATCAGG CTAAAACATA GACTTATAAT 6660 

CAGTAGACTA GCTAGTCCAA TACCAAGGCA TTTCATCCTA GTTTTCATAT TGACCTTGAC 672 0 

CACTTGTTTT CGTTGACCAT ACATCCATAA AAAAGAAAAA AGAGACGCAC CATAGAGAAC 67 8 0 

CTGTAGTATA GTTAACTCAC CGATACAAAG AAATTTCAAT AAGTATAGAG ATACCAATAG 6840 

GACATTTACT TGTTGGAATA TATAAACTGG AATTATTCTT TTCATAGTTA CCTCCGAAAT 6900 

AAATCTTCAT AATCTAAATC TAATATCTGC ACAATCCTTT CTACCCATGG ACTTTGAGGC 69 60 

ATTCGTTGTT CCATCTTGTA GTGGCGAATC TTTTGATATA AACGATTCAA TTCACTTGGA 7 02 0 

TAGTGAAACT CTCCCGCAAA CATTTTTCTG GTTAACTCAA TCCAGCTGAT ATTTCTTTCA 70 80 

GCCAAAATAA TGGACAAGTT CTCCCAAAAT CGTTCAGCCA TATTrCTTCT CCTTTAGTTA 7140 

GATAAATAAT GTGTTTGyGC CATGTAAATC AATTGTTTCG TATCTCTTGG CAATAGAGCT 72 00 

CTAGCCTCTT CCAAATTCAG ACTTGGATAA ACCCGCTTAT TTGAAACCAC AAAAGGAAGT 72 60 

CCGATGGTTA GTTCAGGATT TTTTAAAATT ATCTCAACGA AATCCGTTAA TCTTAGATTG 73 2 0 

TCACGGTTCT TAAATCGTAA TAAATTGGGA GATAAAAACT CAAAAC AAT C TGAAGAATAG 73 80 

CTCATCATCT CAATTAATTT GTCCTTTGTC ATTTCAGAAA CTGAATGACA AGATACCTCA 744 0 

ATGCCATAGT TTTGGAAGAA GTCTAAAAGA AGTTGATTTC TTTGGCTATT TTTACTTAGA 7500 

TAGAGATCAA TCATGGGAGA CCTCCAACAA ATTTGCTTCC ATTTGATATT CTGAGACGAT 75 6 0 

TAAGGAATCT AACAACTTTG AGAAGTTAAT CGATTTCTTG TCTTCATCAT AAGCTTTTAC 7 62 0 

AGTTACTTGG GTTGTAAGTA TCCCCTCTTT TCCCTCGGCT CGATAGTCTT GTCAATATAA 7680 

AACAAAAACA AGATTCTGAT TATCATCTAC AAAGGCATTA ACTCCGTTCT TTATATCCTG 7740 

ACTTTCAAGG AATTCCATAA CGTTTTGAAG ATAGGATTCA TAAAATAGTG GGTAATTATG 7800 

TTTTTTATGG TAATCATCTA AAAATGTTAC CTCAAACTCA CATGGATAAT TGGGCATCAA 7860 

AAATATTTGT TCATCCAGCT GTTTGATTTC TGCATCATGT AATTCTGTTT CTAATTCATC 792 0 

ACAATCTAGT ATTGATTCTT TATTTAATGC TTTTATCTTT TTCCTCTATT TCTTTTAATT 7980 
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TCTTTGCGAT TGCGGCAATC ACAGGAACGG TTACACTATT ACCAACTTGT TTATAGAGCT 8040 

GACTATTAAT AGAGACTTTT CTAGCAGCTT CAAAAGCCTA ATCAGGAAAG CCATGCAATC 8100 

GAAAACACTC TTTAGGAGTG ATTCGTCGTA TTCTCAAACG GTAAAATTGT CCATCTATTA 8160 

AAACACCAGC TACTTGGTAA ACTTGTTTAT CTTCTCCTTC ATAGCTAGCC ACTACTACTC 8220 

CCATTTGACC ACTAGTTGTT AACGTATTAG CTATACCTTT TCCAACTCTA CCACGACGAT 8280 

ACTGAGAACT TGGTCTTTCT AAATTGATTG AATCCCCAAT CTCTGCTTGA GCATATCCTT 8340 

TTTTCGTTGC TTCCCGTACT TTTAGAAATT GGATTGGTTC TGGAATTAGT ATTTTGGGGA 8400 

TTTTATCTCC TCCTTGCATC GTAGTCAGTG TTGGAGATAA GCCCTCACTT CCATAGACAC 8460 

GACCTGTCTC CTTAAAGCTA GTCGGTAAAT CTCCAACAAC GACAATGCCA TAACGATCCT 8520 

GAGTATTTAA AGTAAACATC GGCTCTTGAT TTTCCTTAAA GCGTCTCCCA TTTTGTCTCT 8 5 80 

TGTCTAATCT ATCTGGTGTC ATACAAGGAA TCGCAACTTT AAATCCTTCT CCTTTACCAC 8 640 

GAACTAAGGT TGGCGCAAGA CCTTCTGAAT AATAGACTTT ACCGCTCATT CCACTTCTTG 8700 

ATGGATTCAA ATTTCCTAGT GCTTTCAAAG TCTCAGAGTT AGTTGCTTGA CCTTCTCGTC 87 SO 

TGAAAGGAAA TAAGAGTCTG GTACCTTTCT TTCTAGAATG TCCGATAATA AACACCCTCT 8 820 

CTCTGTTTTT GGGAACGCCA AAATCCTTAC TGTTAAGCAC CTGCCACTCA ACATCAAACC 8 8 80 

CCAACTCATC AAGTGTGGTA AGTATTGTGG TGAACGTCCG TCCCTTATCG TGATTGAGTA 8940 

GGCCTTTAAC ATTTTCAAGA AAAAGAAAAC GTGGTTGGAT TTGTTTGGCC GCCCGAGCAA 9000 

TTTCAAAGAA CAAAGTTCCT CTAGTATCTT CAAATCCCAA TCGTCTTCCT GCGATTGAAA 9060 

ATGCTTGACA AGGGAATCCC CCACAGATGA CATCGACTTT CCCTCTAAGT TTTTTAAATT 9120 

CGTCATCTGA AACATCTCGT ATGTCATGAA ATTCTATTTC TCCTTCCGTT TGAAAAATGG 9180 

ACTTATAAGA TTTCCTAGCA AATTTATCAA TCTCACAAAA TCCCAAGCAC TCATGCCCTT 9240 

GAGCTTCCAT TCCCATCCTA AAGCCTCCTA TCCCAGCAAA TAAATCTAAA ACCCAAATCA 9 3 00 

TTCATACCTC TCTCAACTAG ATGTAACTTA CAAAACCCCT GACCTCATGA GCCACTTTCT 93 60 

TCCTCCTCAT GAGGTCAGTT TTACTTTCTG CTGTTCCAGT ATCGTTTTTC CTCGCTAGAT 9420 

TTCCTCAAAA GGGCAGACTC CTCCCTTGGT TCGTCACACG ATTTTTTCAT CTCGACTGTT 94 80 

CTTTAATGCA TCATTAACGA CGCTTTTCTT CTAGGTGGTT CATAAGGAAC AGGAAGATTC 9540 

AGGTTGACTT TTCTAATCCT AGAATAAAGT GCTGAAAACA ATTCGGAATA GGCATAGAGA 9 600 

CTAGACAATT TGAGGAGCTG CTTGCGTCCT GTTCGAACAC ATTTTCCTAC CACGTGAAGA 9660 

AAAAGATGGC GGAAGCGTTT GATTGTTAAA GTTTGGAAGT CACCTCCAGC TAGATGTTTG 9720 
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AGAAAAAGAT AGAGATTGTA GGCGATACAG CTCATCATCA TACGAACTCG TTTTTGATTA 9780 

AGGTTGAACT ATCCGTTTTA TCGCCAAAAA ATCCCTCCTT CATCTCCTTG ATGAAATTCT 9840 

CGGCTTGACC ACGTCCACGA TAAAGCTGAA ACTGGTCTTG GCTTGTTCCG GTACCGA 9 897 

(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8148 base pairs 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
CCGTGGAACA AGCCAAGACC AGTTTCAGCT TTATCGTGGA CGTGGTCAAG CCGAGAATTT 60 
CATCAAGGAG ATGAAGGAGG GATTTTTTGG CGATAAAACG GATAGTTCAA CCTTAATCAA 120 
AAACGAAGTT CGTATGATGA TGAGCTGTAT CGCCTACAAT CTCTATCTTT TTCTCAAACA 180 
TCTAGCTGGA GGTGACTTCC AAACTTTAAC AATCAAACGC TTCCGCCATC TTTTTCTTCA 2 40 

CGTGGTAGGA AAATGTGTTC GAACAGGACG CAAGCAGCTC CTCAAATTGT CTAGTCTCTA 300 
TGCCTATTCC GAATTGTTTT CAGCACTTTA TTCTAGGATT AGAAAAGTCA ACCTGAATCT 3 60 

TCCTGTTCCT TATGAACCAC CTAGAAGAAA AGCGTCGTTA ATGATGCATT AAAGAACAGT 420 
CGAGATGAAA AAATCGTGTG ACGAACCAAG GGAGGAGTCT GCCCTTTTGA GGAAATCTAG 4 80 

CGAGGAAAAA CGATACTGGA ACAGCAGAAA GTAAAACTGA CCTCATGAGG AGGAAGAAAG 540 
TGGCTCATGA GGTCAGGGGT TTTGTAAGTT ACATCTAGTT GAGAGAGGTA TGAATGATTT 6 00 

GGGTAAATAC AATGAGCTTG AAAGAAGTAG CAAACTCACC AAGCGCCAAT TCTTTGAGAA 6 60 

TCAGATGCTG GATTATACCA TCATTGCGCA TGAGAGTTTT GAAATCATCC GTCATTCTGT 72 0 

CTACCAGACA GATGATCGTG AAGTGGAAAA TGCTCTGGCT TTTGAAGTGA AAAATGATGA 7 80 

AACAGACAAG CTGATTCTGT TATTAAGCGA GGATATTGGT GTAGGTGAAA AATTGTGCCT 84 0 

CGTTGACGGA ACAAAAATGC GTGGAAAATG TTTAGTATAT GATAAAATAA ATGAGAGAAT 9 00 

GATTCGCTTG CAGTGCTAGA AATAGGCATT TTGAATAGTG AATATGTTAT AATAAGTATT 9 60 

AGTAGGAGGT GTTTTAGATT GGAGAAGAAA CTGACCATAA AAGACATTGC GGAAATGGCT 102 0 

CAGACCTCGA AAACAACCGT GTCATTTTAC CTAAACGGGA AATATGAAAA AATGTCCCAA 1080 

GAGACACGTG AAAAGATTGA AAAAGTTATT CATGAAACAA ATTACAAACC GAGCATTGTT 1140 

GCGCGTAGCT TAAACTCCAA ACGAACAAAA TTAATCGGTG TTTTGATTGG TGATATTACC 12 00 

AACAGTTTCT CAAACCAAAT TGTTAAGGGA ATTGAGGATA TCGCCAGCCA GAATGGCTAC 12 6 0 
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CAGGTAATGA TAGGAAATAG TAATTACAGC CAAGAGAGTG AGGACCGGTA TATTGAAAGC 1320 

ATGCTTCTCT TGGGAGTAGA CGGCTTTATT ATTCAGCCGA CCTCTAATTT CCGAAAATAT 13 BO 

TCTCGTATCA TCGATGAGAA AAAGAAGAAA ATGGTCTTTT TTGATAGTCA GCTCTATGAA 1440 

CACCGGACTA GCTGGGTTAA AACCAATAAC TATGATGCCG TTTATGACAT GACCCAGTCC 1500 

TGTATCGAAA AAGGTTATGA ACATTTTCTC TTGATTACAG CGGATACGAG TCGTTTGAGT 15 60 

ACTCGGATTG AGCGGGCAAG TGGTTTTGTG GATGCTTTAA CAGATGCTAA TATGCGTCAC 162 0 

GCCAGTCTAA CCATTGAAGA TAAGCATACG AATTTGGAAC AAATTAAGGA ATTTTTACAA 168 0 

AAAGAAATCG ATCCCGATGA AAAAACTCTG GTATTTATCC CTAACTGTTG GGCCCTACCT 1740 

CTAGTCTTTA CCGTTATCAA AGAGTTGAAT TATAACTTGC CACAAGTTGG GTTGATTGGT 1800 

TTTGACAATA CGGAGTGGAC TTGCTTTTCT TCTCCAAGTG TTTCGACGCT GGTTCAGCCC 18 6 0 

TCCTTTGAGG AAGGACAACA GGCTACAAAG ATTTTGATTG ACCAGATTGA AGGTCGCAAT 192 0 

CAAGAAGAAA GGCAACAAGT CTTGGATTGT AGTGTGAATT GGAAAGAGTC GACTTTCTAA 19 80 

AATGAAGGAA AATGACTTGC AATCTCTGTT AAGAAATAAA ATAATCCCAC CTAGAACAAG 2040 

CTAGGTGGGA TTATTTGCCT ATGAAATGAG AAATTATGGG AGCAAGCTCC TAAATCAACT 2100 

GTTTTTGATC TACTTCTTTA ACTACTTGAT AAAAGTTATA GAAGTAGGCC AAACTTGAAA 2160 

TGATGGTTAC GACTAGGAAT ATTGAAAATT TCCATTGGAC AGGGTTGGTT AAAAGTTGTG 2220 

GAAAGGATAT GAGGAGAAAG AAGAGGGCTG CGTTGAGGAC AGGTATCCGT TTTGATTGTA 22 80 

TTTTCTCAAG TCCTTTATTG AGCGCAGGAA GAAAGAGGAG TAGGAGTAGT AAAACTGTAT 23 40 

GAGAAATAGC TCCTGAAGTA AGGGCGAAGA AAAGGAAAAT ACTGATAAAA ACATGAATGA 24 00 

TCAGTAGTCT AGCTAGTGAT TTCATAAGGC ACCTCCTAAT CCTGGTCTTT TTTAGCTCTT 24 6 0 

GCAATACGAA GTGAGTCGAC AATATGTATC ATCACTCCGA AAAAGAAAGC TCCCAGTATA 252 0 

GTTTTAAAAA TATGTTTTGT ATTTAGAAGA GAACTGATAA AATTTGGATT TTCACTTGTT 2 5 80 

AGGGTATCAA TGAGTGGAAT TATAAAAAAT ATCACTGTTC CATAAATCGA ACCTGCTTTC 2 640 

AGACCAGGAT AACGTAACTG TTTCTTTTCT TTTTTCATGA GTTTCCTCCT AATCCTCATC 27 00 

TTGATTTTTC TTAGTTTTTG CAATGCGACG GGAGATGAGG AACTGTATGC TCGCTCCGAA 27 6 0 

GAAAATAGAA CC G AG AATAC TTGATACACC ATTTCTTATA GTGAGAAGAG AATGAAAATA 2 820 

GTCCTGACCT TCATCTATGA GTATCCTGAG AAGAGGAGTT ATAAAAAACA TCCATAGACC 28 80 

AAAGAACAAA CCTGCTTTCA GACCTGGGTA GTGTAGTTGC TTGCTTTCTT TCTCATTCAG 294 0 

CATATCTGGT TCAATGACTG TGATGCCTGT TTTTTTCATT TGGTAGGTGA CATAGCCAGA 3000 
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AGCGATGAGG GCAATCACTA AAATCAGAGG AGGATAGATT AGAGCCACTT CTTGAGGGTA 3 0 60 

TTTATAGGCC AGAAGGAGTG GAATAAGATT TCCGAAAATC ATCAGATAAA AGAGGATGAT 312 0 

AAAGACTTGG TTCCCAATAC TATCGGCCTC ACGCCGTTTG TATTCGTCAA GGGGACCAGA 3180 

AATACCGTAT GTGCGTTTGA TCAGTTTTTC AGTGAAGGTT TCTTTTTTCA TGAGTTTGCT 3240 

CCTTTTTTAA AAATCTTCCT CCCAAAAGAG ACTGTTGAGG TCAGTTTGGA GGCTGCGGGC 3300 

GAGATTGAGA CAGAGTTCCA AGGTTGGATT GTACTTGTCG TTTTCAATCA TATTGATAGT 33 60 

CTGTCTCGAG ACACCGATAT CCTTGGCGAG TTCGAGCTGG GAAATACCCA ATTCCTTGCG 342 0 

AAATTCTTTC ACACGATTCA TCTGTTCTCC TTTCTGATTT ATGTCGTATA TATTTGACTA 3480 

TATTATAGTC TTTTAAACAT AAAGTGTCAA GTATTTTTGA CATATTTTTT GAAGAAATAG 3540 

TAGTCTCCTT GTCCTATTTG TCTGACAAGT GCAAGCTGGT CGGATTTGTG GTAAAATAGA 3600 

TAAGATATGA CAAAAGAATT TCATCATGTA ACGGTCTTAC TCCACGAAAC GATTGATATG 3 6 60 

CTTGACGTAA AGCCTGATGG TATCTACGTT GATGCGACTT TGGGCGGAGC AGGACATAGC 3720 

GAGTATTTAT TAAGTAAATT AAGTGAAAAA GGCCATCTCT ATGCCTTTGA CCAGGATCAG 37 8 0 

AATGCCATTG ACAATGCGCA AAAACGCTTG GCACCTTACA TTGAGAAGGG AATGGTGACC 3 840 

TTTATCAAGG ACAACTTCCG TCATTTACAG GCATGTTTGC GCGAAGCTGG TGTTCAGGAA 3900 

ATTGATGGAA TTTGTTATGA CTTGGGAGTG TCTAGTCCTC AATTAGACCA GCGTGAGCGT 39 60 

GGTTTTTCTT ATAAAAAGGA TGCGCCACTG GACATGCGGA TGAATCAGGA TGCTAGCCTG 4020 

ACAGCCTATG AAGTGGTGAA CAATTATGAC TATCATGACT TGGTTCGTAT TTTCTTCAAG 408 0 

TATGGAGAGG ACAAATTCTC TAAACAGATT GCGCGTAAGA TTGAGCAAGC GCGTGAAGTG 4140 

AAGCCGATTG AGACAACGAC TGAGTTAGCA GAGATTATCA AGTTGGTCAA ACCTGCCAAG 4200 

GAACTCAAGA AGAAGGGGCA TCCTGCTAAG CAGATTTTCC AGGCTATTCG AATTGAAGTC 42 60 

AATGATGAAC TGGGAGCGGC AGATGAGTCC ATCCAGCAGG CTATGGATAT GTTGGCTCTG 432 0 

GATGGTAGAA TTTCAGTGAT TACCTTTCAT TCCTTAGAAG ACCGCTTGAC CAAGCAATTG 43 80 

TTCAAGGAAG CTTCAACAGT TGAAGTTCCA AAAGGCTTGC CTTTCATCCC AGATGATCTC 4440 

AAGCCCAAGA TGGAATTGGT GTCCCGTAAG CCAATCTTGC CAAGTGCGGA AGAGTTAGAA 4500 

GCCAATAACC GCTCGCACTC AGCCAAGTTG CGCGTGGTCA GAAAAATTCA CAAGTAAGAG 4560 

GGAAAAAGAT GGCAGAAAAA ATGGAAAAAA CAGGTCAAAT ACTACAGATG CAACTTAAAC 462 0 

GGTTTTCGCG TGTGGAAAAA GCTTTTTACT TTTCCATTGC TGTAACCACT CTTATTGTAG 4680 

CCATTAGTAT TATTTTTATG CAGACCAAGC TCTTGCAAGT GCAGAATGAT TTGACAAAAA 4740 

TCAATGCGCA GATAGAGGAA AAGAAGACCG AATTGGACGA TGCCAAGCAA GAGGTCAATG 4800 
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AACTATTACG TGCAGAACGT TTGAAAGAAA TTGCCAATTC ACACGATTTG CAATTAAACA 4860 

ATGAAAATAT TAGAATAGCG GAGTAAGATA TGAAGTGGAC AAAAAGAGTA ATCCGTTATG 4920 

CGACCAAAAA TCGGAAATCG CCGGCTGAAA ACAGACGCAG AGTTGGAAAA AGTCTGAGTT 49 80 

TATTATCTGT CTTTGTTTTT GCCATTTTTT TAGTCAATTT TGCGGTCATT ATTGGGACAG 504 0 

GCACTCGCTT TGGAACAGAT TTAGCGAAGG AAGCTAAGAA GGTTCATCAA ACCACCCGTA 5100 

CAGTTCCTGC CAAACGTGGG ACTATTTATG ACCGAAATGG AGTCCCGATT GCTGAGGATG 5160 

CAACCTCCTA TAATGTCTAT GCGGTCATTG ATGAGAACTA TAAGTCAGCA ACGGGTAAGA 522 0 

TTCTTTACGT AGAAAAAACA CAATTTAACA AGGTTGCAGA GGTCTTTCAT AAGTATCTGG 5280 

ACATGGAAGA ATCCTATGTA AGAGAGCAAC TCTCGCAACC TAATCTCAAG CAAGTTTCCT 534 0 

TTGGAGCAAA GGGAAATGGG ATTACCTATG CCAATATGAT GTCTATCAAA AAAGAATTGG 5400 

AAGCTGCAGA GGTCAAGGGG ATTGATTTTA CAACCAGTCC CAATCGTAGT TACCCAAACG 54 60 

GACAATTTGC TTCTAGTTTT ATCGGTCTAG CTCAGCTCCA TGAAAATGAA GATGGAAGCA 552 0 

AGAGCTTGCT GGGAACCTCT GGAATGGAGA GTTCCTTGAA CAGTATTCTT GCAGGGACAG 5580 

ACGGCATTAT TACCTATGAA AAGGATCGTC TGGGTAATAT TGTACCCGGA ACAGAACAAG 564 0 

TTTCCCAACG AACGATGGAC GGTAAGGATG TTTATACAAC CATTTCCAGC CCCCTCCAGT 5700 

CCTTTATGGA AACCCAGATG GATGCTTTTC AAGAGAAGGT AAAAGGAAAG TACATGACAG 5760 

CGACTTTGGT CAGTGCTAAA ACAGGGGAAA TTCTGGCAAC AACGCAACGA CCGACCTTTG 5820 

ATGCAGATAC AAAAGAAGGC ATTACAGAGG ACTTTGTTTG GCGTGATATC CTTTACCAAA 5880 

GTAACTATGA GCCAGGTTCC ACTATGAAAG TGATGATGTT GGCTGCTGCT ATTGATAATA 594 0 

ATACCTTTCC AGGAGGAGAA GTCTTTAATA GTAGTGAGTT AAAAATTGCA GATGCCACGA 6000 

TTCGAGATTG GGACGTTAAT GAAGGATTGA CTGGTGGCAG AACGATGACT TTTTCTCAAG 606 0 

GTTTTGCACA CTCAAGTAAC GTTGGGATGA CCCTCCTTGA GCAAAAGATG GGAGATGCTA 6120 

CCTGGCTTGA TTATCTTAAT CGTTTTAAAT TTGGAGTTCC GACCCGTTTC GGTTTGACGG 6180 

ATGAGTATGC TGGTCAGCTT CCTGCGGATA ATATTGTCAA CATTGCGCAA AGCTCATTTG 6240 

GACAAGGGAT TTCAGTGACC CAGACGCAAA TGATTCGTGC CTTTACAGCT ATTGCTAATG 6300 

ACGGTGTCAT GCTGGAGCCT AAATTTATTA GTGCCATTTA TGATCCAAAT GATCAAACTG 6360 

CTCGGAAATC TCAAAAAGAA ATTGTGGGAA ATCCTGTTTC TAAAGATGCA GCTAGTCTAA 6420 

CTCGGACTAA CATGGTTTTG GTAGGGACGG ATCCGGTTTA TGGAACCATG T ATAAC C AC A 6480 

GCACAGGCAA GCCAACTGTA ACTGTTCCTG GGCAAAATGT AGCCCTCAAG TCTGGTACGG 6540 
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CTCAGATTGC TGACGAGAAA AATGGTGGTT ATCTAGTCGG GTTAACCGAC TATATTTTCT 6600 

CGGCTGTATC GATGAGTCCG GCTGAAAATC CTGATTTTAT CTTGTATGTG ACGGTCCAAC 6660 

AACCTGAACA TTATTCAGGT ATTCAGTTGG GAGAATTTGC CAATCCTATC TTGGAGCGGG 672 0 

CTTCAGCTAT GAAAGACTCT CTCAATCTTC AAACAACAGC TAAGGCTTTA GAGCAAGTAA 678 0 

GTCAACAAAG TCCTTATCCT ATGCCTAGTG TCAAGGATAT TTCACCTGGT GATTTAGCAG 6840 

AAGAATTGCG TCGCAATCTT GTACAACCCA TCGTTGTGGG AACAGGAACG AAGATTAAAA 6900 

ACAGTTCTGC TGAAGAAGGG AAGAATCTTG CCCCGAACCA GCAAGTCCTT ATCTTATCTG 6960 

ATAAAGCAGA GGAGGTTCCA GATATGTATG GTTGGACAAA GGAGACTGCT GAGACCCTTG 7 020 

CTAAGTGGCT CAATATAGAA CTTGAATTTC AAGGTTCGGG CTCTACTGTG CAGAAGCAAG 7 080 

ATGTTCGTGC TAACACAGCT ATCAAGGACA TTAAAAAAAT TACATTAACT TTAGGAGACT 7140 

AATATGTTTA TTTCCATCAG TGCTGGAATT GTGACATTTT TACTAACTTT AGTAGAAATT 7200 

CCGGCCTTTA TCCAATTTTA TAGAAAGGCG CAAATTACAG GCCAGCAGAT GCATGAGGAT 72 60 

GTCAAACAGC ATCAGGCAAA AGCTGGGACT CCTACAATGG GAGGTTTGGT TTTCTTGATT 7 320 

ACTTCTGTTT TGGTTGCTTT CTTTTTCGCC CTATTTAGTA GCCAATTCAG CAATAATGTG 7 380 

GGAATGATTT TGTTCATCTT GGTCTTGTAT GGCTTGGTCG GATTTTTAGA TGACTTTCTC 7440 

AAGGTCTTTC GTAAAATCAA TGAGGGGCTT AATCCTAAGC AAAAATTAGC TCTTCAGCTT 7500 

CTAGGTGGAG TTATCTTCTA TCTTTTCTAT GAGCGCGGTG GCGATATCCT GTCTGTCTTT 7560 

GGTTATCCAG TTCATTTGGG ATTTTTCTAT ATTTTCTTCG CTCTTTTCTG GCTAGTCGGT 7620 

TTTTCAAACG CAGTAAACTT GACAGACGGT GTTGACGGTT TAGCTAGTAT TTCCGTTGTG 7 680 

ATTAGTTTGT CTGCCTATGG AGTTATTGCC TATGTGCAAG GTCAGATGGA TATTCTTCTA 7740 

GTGATTCTTG CCATGATTGG TGGTTTGCTC GGTTTCITCA TCTTTAACCA TAAGCCTGCC 7800 

AAGGTCTTTA TGGGTGATGT GGGAAGTTTG GCCCTAGGTG GGATGCTGGC AGCTATCTCT 7 8 60 

ATGGCTCTCC ACCAAGAATG GACTCTCTTG ATTATCGGAA TTGTGTATGT TTTTGAAACA 7 920 

ACTTCTGTTA TGATGCAAGT CAGTTATTTC AAACTGACAG GTGGTAAACG TATTTTCCGT 7 9 80 

ATGACGCCTG TACATCACCA TTTTGAGCTT GGGGGATTGT CTGGTAAAGG AAATCCTTGG 8040 

AGCGAGTGGA AGGTTGACTT CTTCTTTTGG GGAGTGGGAC TTCTAGCAAG TCTCCTGACC 8100 

CTAGCAATTT TATATTTGAT GTAAGAATGG CACCCTGATG TTTCAGGG 8148 
(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9909 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS : double 

(D) TOPOLOGY : linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

TACTCCACCC TTAATATCCG TTCCTGTAAA TACTTTACCG CTTTTAAGTT CATAGAATTG 60 

AACTTTTAAA TGCTTGTCTT CAAGCATCTT TTCCATCCAA TTTTTAGGAG TTTGACCAGC 12 0 

TTTAAATAAA AACCTTGCTG GGGTGATTAG TATAGATTTA TCTGCGATTT TATAAGCTTC 180 

ATCAATAAAA TAGTGATATA TCGGCTCATC TCTGGCTTCT CCTGTTTCCT GATACGGAGG 240 

ATTTCCTATC ACGACATCAA ATTTCATTTC ACTTTCCTCG L'TAGATAGGC GCTCAAAACC 300 

TATCATTCTA TTCTTTTTCC AGTCTTTGAT ATGGGTTTTA GATTCTTCTA CTTCTTGGAC 3 60 

TTCTAGCTCA TCCGCAAACA AACTCAATTG TTGAGATTGC TTTTGTTTAG CTGAATAAGG 42 0 

ACTACTTTTT TTCAATCCAT CCATCTGAAA GACATTGTAA GAGATAATAG TCGCAATTTC 480 

TTTCTTTTGC TCTAATGTTG GTTGATTTCC AGTCTTAGCT AGATAATAGT CCTCAAAAGT 54 0 

TGCCAAAAGA TTCTCACGCG CCAAAAGGAG AGAATCTOCT TGATACTCAT AACCATACGA 600 

AGCATGATAA GCATCTTTTA CAAGTTTATA AAATGTGACT TCATCTGAAA CCTCACGACT 660 

AATCCGTTGC AGTTTTCTAT CAACAAAACC AACTCGCTCA GATAATGGAA TTTCCTCACC 72 0 

AGTTACGGTA TCATATCTCG TTACCATATA AGGTGCTTCA CCACAAGTTA CCTCTAACCA 780 

TCGTAAGTCC ACATACTCCT CAAGACTTAA CGAGCCTAAT TTCGATTCTA CATATCCATT 84 0 

TTGCTTTGCG ACCAACCACG TTGGTGTAAA CACTTCTGCC CTTATTTTTG TCCGATCTTT 900 

TTGTTCATAT TTGGATTTTT CAGATCTGGG CTGAATCAAG TTGGCAAAGT TTCCAGTAAC 960 

CTTACTTGGA TTGATGCGAT CACTTGGAGC AAATCCCTTT C CTAACAATT CATAAGAATG 102 0 

CGTAnGCCAA ACAATTGATT TCTTTGTCGT TCGATCTTTT AAAAGAATTT TTAATAAGTC 1080 

AGCCGATTCT TTAGCCAAAC TTTCTTCACT AATATCTATT GTCATCAGCA ACCTCTCTTA 114 0 

TATTGTAAGC CCTATTATAT CATATTTTAA AGAATGAAAA TTTACTTGAA AAAAGTAATT 1200 

CAATAAATAT CTCTCCGATG ACCAACTTCT AGAGTAGCAA CGACTAATTC ATCATCTACA 1260 

ATTTGTACGA TAACTCGATA ATTACCAATT CTATAGCGCC ATTGACCAAC GCGATTACCA 1320 

ACCAAAGCCT TTCCGTGTCG TCTTGGGTCT TCCAAAACAT TGGTTTGTAA ATAGTTTGTA 1380 

ATTAGCTTCT GCGTATAACG GTCCAATTTT TTCAATTGCT TGATAAAACG TCTTGTTGGA 1440 

ACTAATTTAT ACAAATTATT CATCCTTCAA GCCTAAATCA TGCATCATTT CTTCCCAAGT 1500 

AATGGGTTCA ACTCCTTTTT CCAAGTCTTC TAAATACTCT TGATAGGCTA AATCTGCCAC 1560 
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ACGAGCATCG TATTCATCTT CTAGGGCTTC AAGAGTTTTG GTGCGAATAA GTTCCGAAAG 162 0 

GGAAACTCCT TCAAACTTAG CCATTGCTTT CATAAATGTT TTATCAGCTT CAGAAACTTT 1680 

TAATGTAATA GTAGTCATCT TTTGTGCTCC CTTTTTTAAT GGTAACACCA TTGTATTACT 174 0 

TTTTAGGTGT TCAGTCAATA TAAAAAGAAC ACCTTCTCAG CGTTCTTTCT ATATCTCTGT 1800 

CAATGGTGTT GCGGTATCTG GTGAGGTATC ATAAACCTTA AAGTCTACTC CGACTCCCAG 1860 

ATCAGCTTGA GCCAGCTGAT TGACCATGGT CATATGAGCC AGTTCCTTGA TATTGTTTTC 192 0 

CTTAGATAAA TGCCCAAGGT AAATCTTCTT AGTACGATTT CCTAGCGTCC GAATCATAGC 1980 

TTCAGCACCG TCCTCGTTAG AAAGGTGACC AAGGTCAGAT AGGATTCGTT GTTTGAGTCG 2040 

CCAAGCGTAA GAACCTGATC GCAAAATCTC TACATCATGG TTGGCCTCGA TAAGATAACC 2100 

ATCCGCATTT TCGACAATGC CCGCCATACG GTCACTGACA TAACCTGTAT CTGTCAAGAG 2160 

GACAAAACTC TTATCATCCT TCATAAAGCG ATAGAACTGC GGTGCGACTG CATCATGGCT 2220 

TACACCAAAA CTCTCGATGT CGATATCTCC AAAGGTTTTG GTTTTACCCA TTTCAAAAAT 2 280 

ATGCTTTTGC GAAGAATCCA CCTTGCCAAG ATATTTACTA TTTTCCATAG CTTGCCAGGT 2 340 

CTTTTCATTG GCATAAAGAT CCATACCATA CTTGCGAGCC AAAACGCCTA CTCCATGGAT 2400 

ATGATCTGAA TGCTCATGGG TAATCAAGAT GGCATCCAGG TCTTCTGGCT TACGGTTAAT 2 460 

TTCAGCTAGC AGACTGGTAA TTTTCTTGCC AGACAAGCCT GCATCTACTA AAAGCTTCTT 2 520 

TTTTGAGGTT TCCAGATAAA AAGAATTTCC ACTGGAACCC GACGCTAAAA TACTGTATTT 2580 

AAAGCCTATT TCACTCATTC TAGTCTTCTA CTTCATCCTC CCATACTTCT TCTTTCACTG 2 640 

CATCCTTATC ATAAGGGAGT ACAATGGTAA AGGTTGAACC CTTGCCGTAT TCACTCTTGG 2700 

CCCAAATAAA GCCCTTATGT TGTTTGATAA TTTCTTTAGC GATAGACAGT CCTAGACCTG 2 760 

TACCACCTTG TGCACGACTT CTAGCACGAT CCACACGATA GAAACGGTCA AAGATACGTG 2 820 

GTAAATCCTG CTTAGGAATC CCCAAACCGT GGTCAGAAAT GGATAAAATC ATCTGGTCTT 2 880 

CAGTTGTCTT CATTCTGACA GTGATTTTAC CCCCATCTGG CGAATACTTA ATAGCATTAT 2 940 

TTAAAATATT GTCGACAACC TGCGTCATCT TATCTGTATC AATTTCCATC CAGATAGAAT 3 000 

TGATGGGATA ATCTCTCACC AACTCATATT TTTTCTCCTT TTCCTGTCCT TTCATCTTGT 3 060 

CAAAACGATT GAGGATAAAG GTAATAAAAG CAGTGAAGTT AATCAGTTCC ACATCTAGGT 3120 

GACTGGTAGC ATTATCAATA CGTGAAAGAT GGAGGAGATC CGTCACCATG CGCATCATAC 3180 

GGTTGGTCTC ATCAAGAGAA ACCTTGATAA AGTCTGGTGC TACAGTTTCA CACAAAGCCC 3240 

CCTCATCCAA GGCTTCAAGA TAGGATTTTA CGCTAGTCAG AGGAGTCCGT AACTCATGGC 3 3 00 

TAACATTGGA AACAAAGAGT CTTCGTTCGC GTTCTTCCTT CTCCTGCTCC GTCGTATCAT 3 3 60 
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GCAAAACAGC CACCAAACCT GAAATAAAGC CAGACTCTCG ACGTATCAAG GCAAAGCGAA 3420 

CTCGAAGGTT CAAATATTCG CCATTGATAT CTTGGGAATC TAGCAACAAT TCTGGACTTT 3 480 

GGGTAATCAA ATCACGCAAT TCATAGTTTT CTTCTATCTT GAGCAATTCC AAAATGCTTC 3540 

TATTCAGAAC ATCTTCCTTA ACCAACCCCA GTTGCTTCTT GGCTGTATCG TTAATCATGA 3 600 

TAATCTGACC CCGACGGTTA GTCGCAAGAA CCCCATCTGT CATATAAAAC AGAATACTAT 3 660 

TTAGCCTCTT ACTCTCTTGT TCTAGATTTT CCTGAGTGAG ACGAATAACC TCCGACAAGT 3720 

CATTCAAATT ATTGGTAATA TTGGTGATTT CAGACCCACC TTGCATATCA AGAACCTTGG 3 780 

AATAATCTCC TGCAATCAAA TCTTTAACCT TTTGATTGAC TTGCTTCAAC TGAATATTAT 3840 

CACGTCTATT TTCCAGTAAT AAGAGGGTCA CAACAAGGAT GAAACCTAAC AAAATCAGGA 3 900 

TAAAGATAAA ATCTCTGGTA AAAATGGTTT GTTTCAGTAA ATCAAGCATT ATTTCTCATG 3 960 

TAATACCCTA CACCACGGCG CGTCAAGATA TACTCTGGTC GGCTGGGCGT ATCTTCAATC 4020 

TTCTCACGCA GACGTCGTAC AGTCACATCA ACTGTACGGA CATCACCAAA ATAGTCATAA 4080 

CCCCAGACAG TCTCAAGCAA GTGTTCGCGC GTGATGACTT GACCTGTATG CGATGCTAAA 4140 

TGATACAAAA GCTCAAATTC ACGATGGGTT AAGTCTAGTT CTTCGCCATA TTTTTTAGCC 42 00 

ACGTAGGCGT CTGGAACAAT TTCTAAATCC CCAATTTGGA TAGGTTGAGG TTTACTATCT 42 60 

GCTTCCTGAC CATCTACTGG CATAGGTTGA GAACGACGCA GAAGAGCTTT AACACGCGCC 4320 

TGCAACTCAC GATTGGAGAA GGGTTTTGTT ACATAGTCAT CTGCCCCAAG TTCCAAACCG 43 80 

ATAACCTTAT CAAATTCACT ATCTTTGGCT GAAAGCATAA GAATGGGCAC ACTGCTTGTC 4440 

TTACGAATGG TCTTAGCAAC TTCTAAACCA TCAATTTCTG GAAGCATCAA ATCCAGAATA 4 5 00 

ATAATATCTG GTTGCTCTGC TTCAAATTGC TCTAGCGCTT CACGACCATT AAAAGCAGTT 4 5 60 

ACAACTTCGT AACCTTCCTT GGTCATATTA AACTTGATAA TATCCGAGAT TGGTTTCTCA 4 620 

TCATCTACAA TTAGTATTTT TTTCATATGT TCACCTTTTT CTCTACTATT ATACCAAAAA 4 6 80 

AATAGTCAGA AGACACAATA GCTAGTCTTG GCTACTGTCT AAGTTGGCTT GTGCATAAAC 4740 

CTGCCAGATT TTTTGTTGGG GTTTGGCAAG TGGGTAATTC TTGAATTCTT CTGGTGAAAG 4800 

CCAGCGAACT TCCCTATCTG AAAAATCATG GAAGTCACTC ACCTGACCTG CTACAATCTG 4 8 60 

TACATGCCAT TTTCGATGAC TAAAAACATG CTGGACTGTA TCAAAACAAA CATCAAGCCA 492 0 

ATCAACATCT AGGTCATAGT CCTGCTGGAA ACTCTCTTCT GGACTGGGAC CAAAGTTCAC 4 9 80 

ACTTTCTTCC GCAACCTGAT GAAAGAGGTC AAACTGCTCT TCTTGCGAAA AGTTATCAAC 5040 

TTCTATAAAG GGGAAATGCC AAAAACCTGC CAAGAGCTTT TCGCTTTCAT TTTTTTCAAG 5100 
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TAAAAATTGT CCTTGAGAAT TTTTCACAAC TAAGGCTTTA AGATAAATAG GAACCGGCTT 5160 

TTTCTTAGGA GATTTAATTG GATAACGGTC CATGGTTCCA TTCTGATATG CCGCACTAAA 5220 

GTCCTTGACT GGGCTTTCTT CAGGTCTGGG ATTTACAGGA GACTCAATAT CAGACCCTAA 5280 

GTCCATCAAG GCTTGATTAA AATCACCCGG ACGATCCGGA TTAATCAAGA TCTCCATCAT 5340 

TGCCTGAAAA ATTTTTCGAT TACTTGGAAT CCCAATATCG TGGTTGACTT CAAACAGACG 5400 

CGCCAAGACC CGCATGACAT TACCATCTAC AGCTGGCTCA GGCAAGTTAA AAGCAATACT 5460 

GGAAATGGCT CCTGCTGTGT AAGGTCCAAT CCCTTTCAAG CTGGAAATTC CTTCATAGGT 5520 

ATTTGGAAAT TGGCCACCAA AGTCAGTCAT AATCTGCTGG GCTGCAGCCT GCATATTGCG 5 580 

AACTCGAGAA TAATAGCCCA AGCCCTCCCA AGCTTTCAGT AAACTCTCCT CAGGCGCAGT 5 640 

TGCCAGACTT TCGACAGTTG GAAACCAGTC CAAAAATCTT TCGTAGTAAG GGATAACTGT 5700 

ATCCACCCTG GTCTGCTGAA GCATGATTTC AGATACCCAG ATGTGATAAG GATTTTTACT 57 60 

TCTCCTCCAA GGCAAATCTC TTTTGTTTTC ATCATACCAA GCGAGAAGTT TCTCACGGAA 5 820 

AGAAATGACT TTCTCCTCCG GCCACATGAC GATACCGTAT TCTTTCAAAT CTAACATATC 5880 

TCTAGTATAA CACAGAAGGT TTCACCTGTC TTTGTATCTG ATTTATAATA TTTTCAATAG 5 940 

ATAGTATATA ACTTTTCTAT CTACTTATAC TCAATGAAAA TCAAAGAGCA AACTAGGAAG 6000 

CTAGCCGCAG GTTGCTCAAA ACACTGTTTT GAGGTTGTGG ATAGAACTGA CAGAGTCAGT 6 0 50 

ATCATATACT ACGGCAAGGT GAAGCTGACG TAGTTTGAAG AGATTTTCGA AGAGTATAAA 6120 

TCTTATTGAT GAACTGCTTG CAGTCTGAGA AAAAATGAGC TTGGATATTA TTTCCAAACT 6180 

CACTTAAAGT CAATTTCAAT CCACTAGAAC AAGCCTAGTA CAGTTCCATC GCTTTCAACA 6240 

TCCATGTTGA GAGCTGCTGG ACGTTTTGGA AGACCTGGCA TGGTCATAAC ATCACCAGTT 6300 

AAGGCAACGA TGAAGCCTGC ACCTAATTTT GGTACCAATT CACGAATGGT AATTTCAAAG 63 60 

TTTTCTGGTG CTCCAAGCGC ATTTGGATTG TCTGAGAAAC TGTATTGAGT TTTAGCCATA 642 0 

CAGATTGGCA ATTTGTCCCA ACCGTTTTGA ACGATTTGAG CAATTTGTGT TTGAGCTTTC 64 80 

TTCTCAAAGT TCACTTTGCT ACCACGATAG ATTTCAGTGA CAATTTTTTC AATCTTTTCT 654 0 

TGGACAGAAA GGTCATTATC ATACAAACGT TTATAGTTAG CTGGATTTTC AGCAATTGTC 6600 

TTAACAACTG TTTCGGCAAG TGCTACTCCA CCTTCTGCTC CATCAGCCCA GACACTAGCC 6 6 60 

AATTCAACTG GTACATCGAT TGAGGCACAG AGTTC7TTTA AGGCTGCAAT TTCAGCTTCT 672 0 

GTATCAGATA CAAATTCGTT AATAGCTACA ACTGCTGGAA TACCGAACTT ACGGATATTT 67 80 

TCAACGTGGC GTTTCAAGTT AGCAAAACCT GCACGAACTG CCTCTACATT TTCTTCAGTC 6840 

AGAGCGTCTT TAGCCACACC ACCATTCATC TTAAGGGCAC GAAGGGTTGC GACAATAACA 6900 
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ACTGCATCTG GAGATGTTGG CAAGTTTGGT GTCTTGATAT CAAGGAATTT CTCAGCACCA 6 960 

AGGTCCGCAC CAAAACCAGC TTCAGTAACA GTGTAATCAG CCAAGTGAAG GGCTGTTGTC 7 020 

GTCGCCAAAA CAGAGTTACA GCCATGAGCG ATATTGGCAA ATGGACCACC GTGTACAAAG 7080 

GCAGGTGTAC CGTAAATTGT CTGAACCAAG TTTGGCTTAA TAGCATCCTT CAAAATCAAA 7140 

GCCAAGGCAC CCTCAACCTG CAAATCACCT ACAGAAACAG GCGTACGGTC ATAGCGATAA 7 200 

CCAATAACGA TATTCGCCAA ACGACGTTTC AAGTCCTCGA TGTCCGTTGC CAAGCAAAGA 7260 

ATTGCCATGA TTTCTGAAGC AACTGTAATA TCAAAACCAT CCTCACGTGG AATACCGTTT 73 20 

AGAGGACCAC CAAGACCAAC AGTCACATGG CGGAGCGTAC GGTCGTTCAA GTCCACAACG 7 3 80 

CGTTTCCAGA GGATACGACG TTGATCAATT CCCAGCTCAT TCCCTTGGTG CAAGTGGTTG 7440 

TCAATCAAGG CAGAAAGGGC ATTGTTGGCA GTTGTAATAG CATGCATATC TCCAGTAAAG 7 5 00 

TGGAGGTTGA TGTCTTCCAT TGGCAGAACT TGTGCATACC CACCACCAGC AGCACCACCC 7 5 60 

TTGATCCCCA TGACTGGACC AAGAGACGGT TCGCGGATAG CAATCATGGT TTTCTTGCCA 7 62 0' 

ATCTTGTTCA AGGCATCCGC AAGACCAATG GTAAGCGTCG ACTTTCCTTC ACCTGCAGGT 7 6 80 

GTTGGGTTGA TGGCAGTAAC CAAGATCAAT TTACCGACTG GATTGCTCTC AACTGCACGA 774 0 

ATTTTATCAA AGCTGAGTTT AGCCTTGTAC TTTCCGTACA ACTCCAAATC GTCATAAGAA 7 800 

ATACCAAGTT TCTCTACAAC ATCAACAATT GGCTTCAACT CAATACTCTG TGCGATTTOA 7 8 60 

ATATCTGTTT TCATTCAAAA TTCCTCTAAC CTCTTATATG ATAATTCATT ATATCACAAA 7 920 

ACAAGATTTT TAACATCCTA AAACTCTCTA AACGTTCGTA AATATCTCTG TTTTTAAGAC 7 980 

TTTTAGAGTC CTTTCTTAAA TTTTATATGG CTTTATAGTT TGAAACTATA ATAAATCTTC 804 0 

GTTTTTACCA AAAATTTATC ACTTTCATTT TACTTACCGC TTATTTTTGT GTACAATAGT 8100 

GCTATGAAAA TTTTAGTTAC ATCGGGCGGT ACCAGTGAAG CTATCGATAG CGTCCGCTCT 816 0 

ATCACTAACC ATTCTACAGG TCACTTGGGG AAAATTATCA CAGAGACTTT GCTTTCTGCA 82 2 0 

GGGTATGAAG TTTGTTTAAT TACGACAAAA CGAGCTCTGA AGCCAGAGCC TCATCCTAAC 8280 

CTAAGTATTC GAGAAATTAC CAATACCAAG GACCTTCTAA TAGAAATGCA AGAACGTGTT 834 0 

CAGGATTATC AGGTCTTGAT CCACTCAATG GCTGTTTCTG ACTACACTCC TGTTTATATG 84 00 

ACAGGGCTTG AGGAAGTTCA GGCTAGCTCC AATCTAAAAG AATTTTTAAG CAAGCAAAAT 84 60 

CATCAGGCCA AGATTTCTTC AACTGATGAG GTTCAGGTTT TGTTCCTTAA AAAGACACCC 8520 

AAAAT CAT AT CCCTAGTCAA GGAATGGAAT CCTACTATTC ATCTGATTGG TTTCAAACTG 8580 

CTGGTTGATG TTACCGAAGA TCATCTGGTT GACATTGCAC GAAAAAGTC T TATCAAGAAT 864 0 
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CAAGCAGATT TAATCATCGC GAATGACCTG ACTCAAATTT CAGCAGATCA GCACCGAGCT 87 00 

ATATTTGTTG AGAAAAATCA GCTTCAAACA GTCCAGACTA AAGAAGAAAT TGCAGAACTC 87 60 

CTCCTTGAAA AAATTCAAGC CTATCATTCT TAGAAAGGAA AACTATGGCA AACATTCTCT 8820 

TGGCTGTAAC GGGTTCAATC GCCTCTTATA AGTCGGCAGA TTTAGTCAGT TCTCTAAAAA 888 0 

AACAAGGCCA TCAAGTCACT GTCTTAATGA CTCAGGCTGC TACAGAGTTT ATCCAACCTT 8940 

TGACACTACA GGTACTCTCA CAGAATCCTG TCCACTTGGA TGTCATGAAG GAACCCTATC 9000 

CTGATCAGGT CAATCATATC GAACTTGGAA AAAAAGCAGA TTTATTTATC GTGGTACCTG 90 60 

CAACTGCTAA CACTATTGCA AAACTAGCTC ACGGATTTGC GGACAACATG G TAAC C AG T A 912 0 

CAGCTCTAGC CCTACCAAGT CATATTCCCA AACTAATAGC TCCTGCTATG AATACAAAAA 9180 

TGTATGACCA TCCAGTAACT CAGAATAATC TGAAAACATT AGAAACTACG GCTATCAGCT 9240 

GATTGCTCCT AAGGAATCCC TACTAGCTTG TGGAGACCAC GGACGAGGAG CTTTAGCTGA 93 0 0 

CCTCACAATT ATTTTAGAAA GAATAAAGGA AACTATCGAT GAAAAAACGC TCTAATATTG 93 60 

CACCCATTGC TATCTTTTTT GCTACCATGC TCGTGATACA CTTTCTGAGC TCACTTATCT 942 0 

TTAACCTTTT TCCATTTCCA ATCAAACCGA CCATTGTTCA TATTCCTGTC ATTATTGCCA 948 0 

GCATTATTTA TGGTCCACGA GTTGGGGTTA CACTTGGATT TTTGATGGGA TTACTTAGCT 9540 

TGACGGTTAA CACGATTACG ATTCTACCGA CAAGCTACCT CTTCTCTCCC TTCGTACCAA 9600 

ACGGAAACAT CTACTCAGCT ATCATTGCCA TCGTCCCACG TATTTTGATT GGTTTAACTC 9660 

CTTACTTAGT CTATAAACTG ATGAAAAACA AGACTGGTCT GATTTTAGCT GGAGCCCTTG 9720 

GTTCcTTGAC AAATACTATC TTTGTCCTTG GAGGAATCTT CTTCCTATTT GGAAATGTTT 9780 

ATAATGGAAA TATCCAACTT CTTCTGGCAA CCGTTATCTC AACAAATTCA ATTGCTGAAT 9840 

TGGTCATTTC TGCAATTCTA ACCCTAGCCA TTGTTCCACG ACTACAAACC TTGAAAAAAT 9900 

AAAAACAGG 9 909 
(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 112 6 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

TAATTTTCAT ATAATAGTAA AATAGAATGT GTGATTCAAT AATCACCTCA AATAGAAAGG 60 

AAATTCTATG TCAAATCTAT CTGTTAATGC AATTCGTTTT CTAGGTATTG ACGCCATTAA 120 
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TAAAGCCAAC TCAGGTCATC CAGGTGTGGT TATGGGAGCG GCTCCGATGG CTTACAGCCT 180 

CTTTACAAAA CAACTTCATA TCAATCCAGC TCAACCAAAC TGGATTAACC GCGACCGCTT 24 0 

TATTCTTTCA GCAGGTCATG GTTCAATGCT CCTTTATGCT CTTCTTCACC TTTCTGGTTT 300 

TGAAGATGTC AGCATGGATG AGATTAAGAG TTTCCGTCAA TGGGGTTCAA AAACACCAGG 3 60 

TCACCCAGAA TTTGGTCATA CGGCAGGGAT TGATGCTACG ACAGGTCCTC TAGGGCAAGG 42 0 

GATTTCAACT GCTACTGGTT TTGCCCAAGC AGAACGTTTC TTGGCAGCCA AATATAACCG 48 0 

TGAAGGTTAC AATATCTTTG ACCACTATAC TTACGTTATC TGTGGAGACG GAGACTTGAT 540 

GGAAGGTGTC TCAAGCGAGG CAGCTTCATA CGCAGGCTTG CAAAAACTTG ATAAGTTGGT 600 

TGTTCTTTAT GATTCAAATG ATATCAACTT GGATGGTGAG ACAAAGGATT CCTTTACAGA 660 

AAGTGTTCGT GACCGTTACA ATGCCTACGG TTGGCATACT GCCTTGGTTG AAAATGGAAC 720 

AGACTTGGAA GCCATCCATG CTGCTATCGA AACAGCAAAA GCTTCAGGCA AGCCATCTTT 780 

GATTGAAGTG AAGACGGTTA TTGGATACGG TTCTCCAAAC AAACAAGGAA CTAATGCTGT 840 

ACACGGCGCC CCTCTTGGAG CAGATGAAAC TGCATCAACT CGTCAAGCCC TCGGT':'GG(-A 90L 

CTACGAACCA TTTGAAATTC CAGAACAAGT ATATGCTGAT TTCAAAGAAC ATGTTGCAGA 960 

CCGTGGCGCA TCAGCTTATC AAGCTTGGAC TAAATTAGTT GCAGATTATA AAGAAGCTCA 1020 

TCCAGAACTG GCTGCAGAAG TAGAAGCCAT CATCGACGGA CGTGATCCAG TCGAAGTGAC 1080 

TCCAGCAGAC TTCCCAGCTT TAGAAAATGG TTTTtCTCAA GCAACT 1126 
(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 2520 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

CCGGCAACAA AAAAGAAAAA ATCAACAGTT AAAAAAAATC TAGTCATCGT GGAGTCGCCT 60 

GCTAAGCCAA GACGATTGAA AAATATCTAG GCAGAAACTA CAAGGTTTTA GCCAGTGTCG 120 

GGCATATCCG TGATTTGAAG AAATCCAGTA TGTCCGTCGA TATTGAAAAT AATTATGAAC 180 

CGCAATATAT TAATATCCGA GGAAAAGGCC CTCTTATCAA TGACTTGAAA AAAGAAGCTA 240 

AAAAAGCTAA TAAAGTTTTT CTCGCGAGTG ACCCGGACCG TGAAGGAGAA GCGATTTCTT 3 00 

GGCATTTGGC CCATATTCTC AACTTGGATG AAAATGATGC CAACCGTGTG GTCTTCAATG 3 60 
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AAATCACCAA GGATGCAGTC AAAAATGCTT TTAAAGAACC TCGTAAGATC GATATGGACT 42 0 

TGGTCGATGC CCAACAAGCT CGTCGGATCT TGGATCGCTT GGTAGGGTAT TCGATTTCGC 480 

CTATTTTGTG GAAGAAGGTC AAGAAGGGCT TGTCAGCAGG TCGCGTTCAG TCCATTGCCC 54 0 

TTAAACTCAT CATTGACCGT GAAAATGAAA TCAATGCCTT CCAGCCAGAA GAATACTGGA 6 00 

CAGTTGATGC TGTCTTTAAA AAGGGAACCA AACAATTTCA TGCTTCCTTC TATGGAGTAG 6 60 

ATGGTAAAAA GATGAAACTG ACCAGCAATA ACGAAGTCAA GGAAGTCTTG TCTCGTCTGA 720 

CGAGTAAAGA CTTTTCAGTA GATCAGGTGG ATAAGAAAGA GCGCAAGCGC AATGCTCCTT 7 80 

TACCCTATAC CACTTCATCT ATGCAGATGG ATGCTGCCAA TAAAATCAAT TTCCGTACTC 840 

GAAAAACCAT GATGGTTGCC CAACAGCTCT ATGAAGGAAT TAATATCGGT TCTGGTGTTC 900 

AAGGTTTGAT T AC C TAT AT G CGTACCGATT CGACTCGTAT CAGTCCTGTA GCGCAAAATG 960 

AGGCGGCAAG CTTCATTACG GATCGTTTTG GTAGCAAGTA TTCTAAGCAC GGTAGCAAGG 102 0 

TCAAAAACGC ATCAGGTGCT CAGGATGCCC ATGAGGCTAT TCGTCCGTCA AGTGTCTTTA 1080 

ATACACCAGA AAGCATCGCT AAGTATCTGG ACAAGGATCA GCTTAAGCTA TATACCCTTA 114 0 

TCTGGAATCG TTTTGTGGCT AGCCAGATGA CAGCGGCCGT TTTTGATACC ATGGCTGTTA 12 00 

AATTGTCTCA AAAAGGGGTT CAATTTGCTG CCAATGGTAG TCAGGTTAAG TTTGATGGTT 1260 

ATCTTGCCAT TTATAATGAT TCTGACAAGA ATAAGATGTT ACCGGACATG GTTGTTGGAG 132 0 

ATGTGGTCAA ACAGGTCAAT AGCAAACCAG AGCAACATTT CACCCAACCG CCTGCCCGTT 1380 

ATTCTGAAGC AACACTGATT AAAACCTTAG AGGAAAATGG GGTTGGACGT CCATCAACCT 1440 

ACGCGCCAAC CATTGAAACC ATTCAGAAAC GTTATTATGT TCGCCTGGCA GCCAAACGTT 1500 

TTGAACCGAC AGAGTTGGGA GAAATTGTCA ATAAGCTCAT CGTTGAATAT TTCCCAGATA 1560 

TCGTAAACGT GACCTTCACA GCTGAAATGG AAGGTAAACT GGATGATGTC GAAGTTGGAA 1620 

AAGAGCAGTG GCGACGGGTC ATTGATGCCT TTTACAAACC ATTCTCTAAA GAAGTTGCCA 1680 

AGGCTGAAGA AGAAATGGAA AAAATCCAGA TTAAGGATGA ACCAGCTGGA TTTGACTGTG 1740 

AAGTGTGTGG CAGTCCAATG GTCATTAAAC TTGGTCGTTT TGGTAAATTC TACGCTTGTA 1300 

GCAATTTCCC AGATTGCCGT CATACCCAAG CAATCGTGAA AGAGATTGGT GTTGAGTGTC 1860 

CAAGCTGTCA TCAGGGACAA ATTATTGAGC GAAAAACCAA GCGTAATCGC CTATTCTATG 1920 

GTTGCAATCG CTATCCAGAA TGTGAATTTA CCTCTTGGGA CAAGCCTGTT GGTCGTGACT 1980 

GTCCAAAATG TGGCAACTTC CTCATGGAGA AAAAAGTCCG TGGTGGTGGC AAGCAGGTTG 2 040 

TTTGTAGCAA AGGCGACTAC GAGGAAGAAA AGATGGCTCT TTGTCAACTG TAGTGGGTTG 2100 

AAGTCAGCTA AGCTCGAGAA AGGACAAATT TTGTCCTTTC TTTTTTGATA TTCAGAGCGA 2160 
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TAAAAATCCG TTTTTTGAAG TTTTCAAAGT TCCGAAAACC AAAGGCATTG CGCTTGATAA 22 2 0 

GTTTGATGAG ATTATTGGTC GCTTCCAATT TGGCGTTAGA ATAGTGTAGT TGAAGGGCGT 22 80 

TGACGATTTT CTCTTTGTCC TTTAGAAAGG TTTTAAAGAC AGTCTGAAAA AGAGGATGAA 23 4 0 

CCTGCTTTAG ATTGTCCTCA ATGAGTCCGA AAAATTTCTC CGGTTCCTTA TTCTGAAAGT 2400 

GAAACAGCAA GAGTTGATAG AGCTGATAGT GATGTTTCAA GTCTTGTGAA TAGCTCAAAA 24 6 0 

GCTTGTTTAA AATCTCTTTA TTGGTTAAAT GCATACGAAA AGTAGGGCGA TAAAAATGTT 2 52 0 
(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10993 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

TTTTCTCGAT AATAACTTCC ACCTTATTAT TTGGGATACC CTCCTCTTCT TCACCACCAC 60 

GTTCATAGTA GTCATCGCGA TAGAGAAAAG CTACGATATC AGCGTCCTGC TCAATAGACC 12 0 

CAGATTCACG AATATCAGAC AAGACCGGTC TCTTGTCCTG ACGTTGTTCT ACACCACGAG 180 

AAAGCTGACT CAGAGCGATT ACTGGAACCT TCAATTCCTT GGCTAGTATT TTCAACTGAC 240 

GAGAAATTTC AGAAACTTCT TGTTGACGAT TTTCTCGACC AGTTCCCGTG ATAAGTTGCA 300 

AATAGTCTAT CAAAATCAAA CCAAGATTTC CAGTTTCTTG AGCCAATTTA CGAGAACGAG 360 

AACGAATCTC TGTAATCCGA ATACCTGGCG TATCATCGAT ATAGATACTG GCGTTAGcTA 420 

GATTACCCTG AGCAATAGTA TATTTTTGCC ACTCCTCATC TGTCAATTGC CCTGTACGGA 480 

TAGAATGTGA CTCCACTAAG CCTTCTGCAG CTAACATACG ATCTACCAAG CTTTCCGCAC 540 

CCATTTCGAG TGAAAAAATA GCAACCGTTT TGTCCAACTT AGTCCCAATG TTCTGAGCGA 600 

TATTCAAGGC AAATGCTGTC TTACCAACTG CTGGACGAGC TGCTAAGATA ATCAACTCCT 660 

CCTCATGAAG TCCTGTTGTC ATATGATCCA AATCACGATA ACCTGTCGCA ATACCTGTAA 720 

TATCGGTCGT TTGTTGCGAG CGAGCTTCCA GATTTCCAAA GTTGAGATTC AACACATCTC 780 

GAATGTTCTT AAACCCGCTT CGATTTGCAT TTTCACTGAC ATCAATCAAC CCTTTTTCTG 840 

CCTGAGCAAT AATTTCATCA GCTGGTTGTG ACGCTTCGTA AGCTTGGTTG ACAGACTCTG 900 

TCAACTTGGC AATTAAACGA CGTAGCATTG CTTTTTCTGC AACAATCTTA GCATAATACT 9 60 

CCGCATTAGC AGAAGTTGGC ACAGAATTAA CAATCTCAAC CAAGTAAGAC AAGCCACCAA 1020 
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TATTCTGTAA ATCACCTTGA TTATCAAGGA TAGTACGAAC CGTTGTTGCA TCTATGGCAT 1080 

CACCACGATC GGATAAATCG ACCATGGCTT GGAAAATCAA ACGATGGGCA TACTTAAAAA 1140 

AGTCCCGAGA CTCAATGTAT TCTCGCACAA AAACAAGTTT ACTCTCATCA ATAAAGATAG 1200 

CCCCTAAAAC GGATTGCTCA GCTAAGATAT CTTGAGGTTG TACTCGTAAC TCTTCTACTT 12 60 

CTGCCATCAG ACTTCCCTTC CTTTTACAAT CTTGTCAAGA AGGTGTAAAC TTATCCTTCT 1320 

TTCACACGAA GATTGATTAC ACTTGTGATA TCTTGATAGA TTTTCACTGG CACATCAATC 13 80 

AAACCAACCG CTCGAATCGG AGCTTGTACT TGAATATGAC GTTTATCAAT CTTAATTCCA 1440 

AATTGCTTTT GCAATTCTTC TGCAATCTTC TTATTGGTAA TAGAACCAAA GGTACGACCA 1500 

TCTGGACCAA CTTTTTCAAC AAATTCTACA ACAGTTTCTT CTGCTTCAAG TTGTGCTTTA 1560 

ATTGCTTTTC CTTCTGCAAT CATCTCAGCG TGAGCTTTTT CTTCCGATTT TTGTTTACCA 162 0 

CGAAGTTCAC CTACAGCTTG AGCAGTCGCT TCTTTGGCTA GATTCTTTTT GATAAGAAAG 1680 

TTTTGCGCAT ACCCTGTTGG TACTTCCTTA ATTTCGCCTT TTTTACCTTT TCCTTTAACA 1740 

TCTGCTAAAA AGATTACTTT CATTCTTCTT TCTCCTTTTC CTTCATTTCA TTTAATACAA 18 00 

TTTCTGTCAG TTTTTCACCT GCTTCTGACA AGGTTACATC TTTAATTTGA GCTGCTGCCA 18 60 

AATTAAAGTG GCCTCCACCG CCTAACTCTT CCATAATCCG TTGTACATTC AGTTTACTAC 192 0 

GACTTCGAGC TGAGATAGAG ATAAATCCTT GTGTATTCTT CGCAAGAACA AAACTCGCTT 1980 

CAATACCTGA CATGGCTAAC ATGGCATCTG CTGCCTTACT AATAACAACT GTATCATAGC 2040 

ATTTCATGTC CTTAGCCTCT GCTATTAGTA CATCTGAACC TAATTTACGC CCCTGTAAAA 2100 

TAAGTTCATT GACCTCACGA TATTCTTCAA AATCTGTCGC AGCGATTTCC TGGATAGCAA 2160 

TACTATCACT TCCGCGCGTT CTGAGATAGC TAGCAACATC AAATGTCCGA CTAGTTACTC 22 2 0 

GCGAGGTGAA ATTTTTAGTA TCCAACATCA TACCAGCCAT CAAGACACTT GCTTGCATAC 22 8 0 

GACTCAAACG ATTTTTCTTA GAATTCTGGA ACTGAATCAA TTCCGTTACC AACTCACTGG 2340 

CACTACTTGC ACCACTTTCG ATATAAGTAA TAACCGCATT ATCTGGAAAA TCCTGATCCC 2400 

TTCTATGGTG GTCAATAACA ATGGTTTGGG TAAATAAATC ATAAAATTCT TTTGATAATG 2460 

TTAAGGCTGT CTTTGAATGG TCTACAAGAA TCAACAAAGA ACGATTGGTC ACCATCCCCA 252 0 

TTGCATCCTT AACAGACAAC AACTTCGTAA CTCCTTCTTT TTCTATGAAT GAAACAGCTC 2580 

GTTCAATATC TGGAGACATT TGTTCTTCAT CATAAAGAGC ATAGCTATTT TCAATCACAT 2640 

TGCTGGCGAA CAACTGCATA CCTACAGCAG AGCCCAAAGC ATCCATGTCT AAATTTTTGT 2700 

GACCGACTAC AAAAACCTGA TCTACACTCC GAATCTTATC TGAAATAGCT GTCATCATAG 2760 

CGCGCGTACG AGTCCGTGTA CGCTTGATTG AAGCAGCAGA CCCACCACCA AAATAAACTG 2 820 
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GATTTTTCGT TTCGTCGTTT TCCTTAACAA CCACCTGGTC GCCACCACGT ACTTCAGCCA 2 880 

AGTTCAAATT GAGCAAAGCA ACTTTCCCTA TCTCATCATG ATTTCCATCG CCATAAGAAA 2940 

ATCCCATACT TAAGGTCAAG GGCAACTGTC TCTGTTTCGA CTCTTCTCTG AAAGCATCAA 3 000 

TAACAGAAAA TTTATCATTC ATCAAGCCCT CAAGCACCGT GTAGTCAGTA AATAGATAAA 306 0 

ATCGATCCAT ACTTACCCGA CGAGAAAACA TCATGTGTTT TTCTGAAAAC TCTGATATAA 3120 

AATTAGCTAC AAAACTATTG ATTTGACTAA TATCTGACTC AGAAGTTTCA TCCTCCAAAT 3180 

CATCATAATT ATCCACAGAG ACAATCCCAA TCACTGGTCT ACTTGTTACC AATTCATCTG 32 4 0 

TTATGGCTTG TTCCCTGGAT ACATCTACAA AATACAAAAC ACCGGAAGAA GCATCCATAT 33 00 

G AAC AG CAT A ACGCTTCTCA CCAAGCTTGG CATAAGTAGA CGGATTTCCT ACTGAAGCCT 33 60 

TGATAATCGT TTGAACAGCT TCTAAATCAA AATCACCATC TTCCTTGGTC AAAATCAATT 342 0 

CAGCATAGGG ATTAAACCAC TCAACCTCTC CAGAAGATAA ATTCAATTTC ATAACACCTA 3480 

CAGGCATCTG TTCCAATAGA GCTGTCAAAC TTTCTTCCGC TTGGTGGTTT ACATACTGTA 3 54 0 

TCTGTTCTAC ATCACTCCTT GTATAATGCA CTCTCAGTTT CTTAAATAAA AAAACATAGC 3 600 

C TC C T AC AAA AAGAAACAAA ATTAAAACCG TCAACAGATT ATTATTAACA AAAATAATGA 3660 

AAGTGGATAA GACTCCAAAC GCAATCAATC CTACTAGAAT AGGAAAAATT GGACTTACAT 37 2 0 

AAAATTTTTT CATTCAAAAC CTCTTGGCAC CCATTATACC ATAATACCCC TCAAAAAGCG 37 80 

ACTTTTTAAA AGTGTAATCA GTAATTCTAT CAATTATAAG AAAAAGGTAG TTTACAATTC 384 0 

AGTAAACCTA CCTTTACACA TATTGAAATT AAGATTCTTT AACCTCTAAC AAACCAATTT 3900 

CGCCATCCTC ACGACGATAA ATCACATTGG TTGTCTGATC TTCAACATCC ACATAGATAA 39 60 

AGAAATCATG CCCCAATAAA TCCATTTGTA GAATTGCTTC TTCCAAATCC ATTGGTTTTA 402 0 

AATCAATTTG TTTTGAACGA ACAACTTTAG ACTGGACAAT ATTTGAATCT TCCACCAAAG 4080 

CATCTGTAAA TAATTGACCA GTTGCTACCT TATTTTTATT TTTACGCTCG ATTTTTGTTT 414 0 

TATTTTTACG AATCTGACGT TCAATTTTAT CAGTTACAAG GTCAATTGAA CCATACATAT 4200 

CTTGAGATAC ATCTTCTGCG CGGAGAGTAA TAGATCCAAG CGGAATCGTT ACTTCCACTT 4260 

TAGCCGTTTT TTCACGATAA ACTTTTAAGT TAATTCGGGC ATCCAACTCT TGTTCTGGTT 4320 

GGAAGTACTT TTCGATCTTT TCGAGTTTAG AAACTACATA ATCACGAATT GCTTCTGTTA 43 8 0 

CTTCTAGGTT TTCACCACGG ATACTATATT TAATCATATG AGTACCTTCT TTCTAAACAT 4440 

TTTTGTTTTT ATGATTTTAT TATAACGCTT TCATTCTATT TTTGCAAATT TTTTCCTCAT 4500 

CTTACAAGGG AAAATGTTTT TACATCCTTA GCACCAGCTT CTTCCAACAG TTTCTTAACA 4560 
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CGATTTATAG TTGCTCCTGT AGTATAGATA TCATCTATAA GTAGGATTTT TTTAGGAATA 4 620 

GTGACTCCAC TTTTAATAAA GAAAGGAAGT TCTGTCCCCA AGCGCTCTGA ACGATTTTTA 4 680 

GAAGAACTGG CTCTCTCTTC TCTTTTCTCT AATAAATCCA GATACTCAAA GCCTGCTGCC 4740 

TCTACCAAGC CCTCAACCTG ATTAAATCCT CTATTAGCAT ATCTATCAGG ACTTAGGGGA 4 8 00 

ATTACAACAA ATTGATACTC TTTGTACTTT TTCAACTCCT CACTTAAAAA TGAAGCGAAA 48 60 

ACTTTTCTTA ACAGGAAGTC TCCATCAAAC TTATACCGAC TGAAAAAATC CTTCATAGCT 4 92 0 

TGATTGTAAG TAAAAATCGC TCTATGACTG ACTTCAACTC CCTCTTTACA CCAAAGTTGA 4980 

CAATCTTGAC ACTTTGTTGA CAACTCTGTT TTCATACAAT TTGGACAGTT CTCTTCCCCA 504 0 

ATTCTTTCAA AAGT AGAAT C ACAGTCTGAA CAAAGACAAG AGTCATCATT CCTCAGAAGT 510 0 

AAGAGACTAC TAAAAGTTAA AACAGTCTTC ATAGTCTGCC CACATAACAA GCACTTCATA 5160 

GACCAGCCTC CTTATTCATC ATCTGAATTT CCTTAATCGC CTTCTTGATT GAAGCATTTA 52 2 0 

ACCCATCATG GAAGAAAAGC AAATCTCCTG TCGGTCTATC CATGCTTCGT CCAACTCGTC 52 80 

CACCAATCTG AATCAAACTA GACTTGGTAA ACAAACGATG ATTGGCCTCT ACTACGAAAA 53 4 0 

CATCCACACA AGGGAAGGTA ACTCCGCGCT CCAAGATTGT CGTACTGATA AGTATTGTCA 54 00 

GTTCTCCATC TCGAAAAGCT TGTACTTGCT CTAATCGATC CTCTGTTACA GAAGATACAA 54 60 

AGCCAATTTT CTCATTTGGA AATTGCTCCT GTAAGATTTC TGCTAACTGC TCCCCTTTCT 552 0 

TAATTTCTGA AGCAAAAATG AGTAACGGAT AAGCTGTCTT TCTCTGCTTC TCAATATAGG 5580 

ACTTTAACTT TGGTGACAAA CGATTCTTGT CTAAGTAGCG ATTAAAATCC GATAACCAAA 5640 

TTGGTTTTGG AATAATCAAC GGATTTCCAT GAAACCGTCT CGGTAAATTC AGTCTTTTTA 5700 

GTTCTCCTAA ACGGACCTTT TTATCTAACT CATTGGTCGA AGTCGCTGTT AAAAAGATTC 5760 

TCAATCCATT CTCCTTTACA CTATTCTTGA CAGCGTGGTA AAGCATGGGA TTATCAACAT 582 0 

AAGGAAAAGC ATCTACTTCA TCCACTATCA GCAAATCAAA AGCTTGATAA AACTTCAATA 588 0 

ACTGATGGGT TGTTGCAACA ACTAGTGGTG TTCGAAAATA AGGTTCCGAT TCTCCATGTA 594 0 

GCAAAGCTAT CCCGCAAGAA AAATCCTGTT GCAGGCGCTT GTACAGCTCC AAACAAACAT 6 00 0 

CTATGCGAGG ACTAGCCAAA CACACTGCAC CACCCGCATT GATCACTTTA GCCACTACTT 6060 

GAT AAAT CAT TTCTGTCTTT CCAGCTCCTG TTACCGCATG AACTAAGGTT GGCTTTTGCT 6120 

TGTCTACTAC TTGAAGCAAT CCCTCTGACA CCTTCTCTTG AAAAGGAGTT AATTGGCCGC 6180 

GCCATTTGAG AACATCTTGC TTTGGAAAAT CCTCCTGCGG AAAATAGTAT AAAGTTTGAT 6240 

CACTTCTGAC TCGCTTCATC AGCAAGCACT CTCGACAATA GTAAGCACCG ATGGGCAAAT 6300 

ACCATTCTTC TAGAATAGTA CTATTACAGC GTTGACAGAA AAGTTTCCCC TTCTCCTTTC 6360 
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TCATTGCTGG AAGTTTCTCC GCCAACTGAC GTTCTTCTTC TGTTAATTCA TTCTCAGTAA 642 0 

ATAAACGACC GAGATAATCT AAATTTACTT TCATACTTCT TTATTCGTAA AAACTAGCAC 6480 

TTTAGATGAT TTTTTAGTAC AATTAAATCA TGGAATTTAG GACAATTAAA GAGGACGGTC 6540 

AAGTCCAAGA AGAAATCAAA AAATCTCGCT TTATCTGCCA TGCCAAGCGT GTTTATAGCG 6 600 

AAGAAGAGGC TCGTGACTTC ATTACTGCCA TCAAAAAAGA ACACTACAAA GCGACACATA 6660 

ACTGCTCTGC CTTCATTATT GGAGAACGTA GTGAAATTAA ACGTACAAGT GATGATGGTG 6720 

AGCCTAGTGG TACTGCTGGT GTTCCCATGC TTGGGGTACT AGAAAATCAC AATCTCACCA 67 80 

ATGTCTGTGT GGTCGTGACA CGCTACTTTG GTGGTATTAA ACTAGGCGCT GGAGGACTAA 6840 

TTCGTGCTTA CGCCGGCAGT GTCGCCTTAG CTGTCAAAGA AATTGGTATT ATTGAAATAA 6900 

AAGAACAGGC TGGCATTGCT ATTCAAATGT CTTATGCTCA GTACCAAGAG TACAGTAACT 69 60 

TCCTTAAAGA ACATGGTCTC ATGGAGCTGG ATACAAACTT TACAGATCAA GTCGATACGA 7 02 0 

TGATTTATGT TGATAAAGAA GAAAAAGAAA CTATTAAAGC TGCACTTGTG GAGTTTTTTA 7080 

ATGGAAAAGT CACTTTAACT GACCAAGGTT TACGAGAGGT TGAAGTTCCT GTAAACTTAG 714 0 

TGTAAACAAT GAATAATACA GCGTTTCGTT GACATTCTCA CAACTACTTT AGCGAGCAAA 72 00 

ATAAAAAGAG GCGTACCAAA ATATACTAGA AAATGAAGCA ATTCAAACGA AACCTGATAT 7260 

CGTTTTCCTT CACACCTATT TACTAGAATT AGCTGAACGC AATCACTTGA AAATTAATGA 73 2 0 

CTTTGATCTA TGATATATAG AAATGGTATG GATAGCGTTA TACTAAAGAT ATCTTATACA 7380 

AAGAGGTATT CATATGTCTA TTTATAACAA CATTACTGAA TTAATCGGTC AAACACCGAT 7440 

TGTTAAACTT AACAACATCG TGCCAGAAGG TGCTGCAGAC GTCTATATAA AGCTTGAAGC 7 500 

ATTTAATCCT GGTTCATCTG TAAAAGACCG TATTGCCCTT AGCATGATTG AAAAAGCTGA 7560 

ACAAGATGGT ATTCTGAAAC CTGGTTCTAC TATTGTTGAA GCAACAAGTG GAAACACCGG 7 62 0 

TATTGGACTT TCATGGGTAG GTGCTGCTAA AGGGTATAAA GTCGTCATCG TTATGCCTGA 7 680 

AACTATGAGT GTAGAACGAC GTAAAATTAT CCAAGCTTAT GGTGCTGAAC TCGTCCTAAC 774 0 

TCCTGGTAGC GAGGGAATGA AAGGTGCTAT TGCTAAGGCT CAAGAAATCG CTGCTGAACG 7800 

TGATGGTTTC CTTCCTCTTC AATTTGACAA TCCAGCTAAT CCAGAAGTAC ACGAAAGAAC 78 60 

AACAGGAGCT GAG AT AC TAG CTGCTTTCGG TAAAGATGGA TTAGATGCCT TTGTTGCTGG 792 0 

AGTAGGTACT GGTGGAACGA TTTCTGGTGT TTCTCATGCA CTCAAATCAG AAAATTCTAA 7980 

CATTCAAGTT TTTGCAGTAG AAGCAGATGA ATCTGCTATT CTATCTGGTG AAAAACCTGG 8040 

TCCTCACAAA ATTCAAGGTA TCTCAGCTGG ATTTATTCCT GATACACTTG ATACTAAAGC 8100 
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CTATGATGGT ATCGTTCGTG TAACATCAGA TGACGCTCTT GCACTCGGAC GTGAAATTGG 8160 

TGGAAAAGAA GGCTTCCTTG TAGGGATTTC CTCAGCTGCA GCTATCTACG GAGCCATCGA 8220 

GGTTGCCAAA AAATTAGGTA CAGGTAAAAA AGTCCTTGCC CTAGCACCAG ATAACGGTGA 8280 

ACGTTATCTC TCTACAGCAC TTTATGAATT GTAACCGTCC AATAACGAAG TCTATTGAAA 8340 

AATCTCCAGA CTAGAGAACT CACGGATAGT TCCTAATCTG GAGATTTCTT ATTTGCACTT 84 00 

TTCTTGTACA ACTTTAGTCC ATGGTAAATA GGCCTCTAAA ACCTCTTTGT TTACGAGAGT 84 60 

TTCCACGTTT GGAAGACATT CTAGAAGATA GGATAGATAT TTCTCACTAT TTATAATGGA 8520 

TTGAAATAAG ATATGAACAA ATCGATTAGA ACATGATGGT AAAGCGTAAT CCCTTGTTTC 8 5 80 

TCAGCTTTCC CAGACAAAAA AGTCCAATAG TAAGTCAGCT GACTATCACT CTCTAGCACC 8640 

CTATAAGAAG TTTCATCCGC ATGAAGTAAG GGCTGAGTCA ATAGTCTCTC TCGCAAGAGG 8700 

TTATAAAGGG GCTCCAAATA GTATTGACTC GTCTTGATAT GCCAATTAGA GATTTCCTTA 87 50 

CGTGTGATTG GTAAACCCAT CCTAGCCCAA TCTTCTTCTT GGCGATAATT GGGTACCTTC 882 0 

AGATTAAACT TCTGATGGAT GGTGTGAGCG ATAATAGAAG CTGAGCCAAA GTTATGCGCT 8 8 80 

AAAGGGGCTT TAGGAATAGG AGCTTTCACA AGCTTATCCA GATGATTATC TTTTACTCGT 894 0 

TATGGACAAT GCTATATGGC ATAAATCAAG TACCTTAAAG ATTCCGACTA ATATTGGCTT 9000 

TGCATTTATT CCTCCATACA CACCAGAGAT GAACCCCATT GAACAAGTGT GGAAAGAGAT 90 60 

TCGTAAACGT GGATTTAAGA ATAAAGCCTT TCGAACTTTG GAAGATGTCA TACAAGGACT 9120 

GGAGAAGGAG GTGATAAAGT CCATCGTTAA TCGGAGACGG ACTAGAATGC TTTTTGAAAA 9180 

CAGATGAGTA TAAAAAGAAA GTCCTCATTT CAATAGAAAT CACGACTTTC TGATGAATTT 9240 

ATAGTAAAAT GAAATAAGAA CAGGATAGTC AAATCGATTT CTAACAATGT TTTAGAAGCA 93 00 

GAGGTGTACT ATTCTAGTTT AAATCCACTA TATTTGGGGA GTGATAGAAA AGCCCTTCAT 9 3 60 

CAGCCAATCT ACTTGTTCAG GTGCGAGAGC TTTGACATCC TTTTCTGTAC TGGACCAAGT 9420 

CAGTTTTCCG TTCTCAAAGC GTTTATATAA TATCCAAAAT CCTTGACCAT CCCAGTAAAG 9480 

AACTTTAAAG CGGTCTTTAC GTCCACCACA AAAGAGAAAG ACTTGATCGG AGAAAGGATC 9540 

CAATTCAAAG TGGGTTTTAA CTACATAGGC TAATGAGTCT ATTCCCTGCC TCATATCTGT 9600 

CTTGCCACAA ACAAGGTGAA CTTGACCTAA ATCACTTAGT TGAATTATCA TAGTACAATA 9 660 

CCTTTCCTCC GATAATTATT TTTTATCTGG TATACTGGAA GTTGGGGAAT TAGGATAGAT 9720 

ACCTTGTTAT GACGCGCTTA CTATGAATTT GAAGTATAGT CTCCTAAATG CACTTAGCCC 97 80 

TTATTATAGG GCTTTTTGTT TTAATTATTC TAATCGAGTG AGACTGGGGA AAAAACAATT 9840 

TCAGGAAAAA TCTAAGCCCT ATACAAAAAA GGAAGCAATT TGCTTCCTTT CTATTATTAG 9900 
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TTATTCAAGG CTGCTGCCAT TGTAGCTGCA ACTTCAGCTT CGAAGTCGTT TGCAGCTTTC 99 60 

TCGATACCTT CACCAACTTC AAAGCGAGCA AACTCAACTA CCGAAGCGTT AACTGATTCA 1002 0 

AGGTATGCTT CAACTGTCTT GCTGTCATCC ATGATGTAAA CTTGTGCAAG AAGTGTGTAA 10080 

GCTTGGTCAA CTTTAGTGTT ATCAAGCATG AAGCGATCCA TTTTACCTGG AATAATTTTG 10140 

TCCCAGATTT TTTCTGGTTT GCCTTCTGCA GCCAATTCAG CTTTGATGTC AGCTTCAGCT 10200 

TGAGCAATAA CATCATCAGT TAATTGAGCT TTTGATCCAT ACTTCAAGTG TGGAAGAGCT 10260 

GGTTTATTAA CCATTGCACG GCTTTCGTTG TCTTGGTCGA TAACGTGATT CAATTGTGCC 10320 

AACTCATCTT TAACGAATTG CTCATCCAAT TCTTTGTAAG AAAGAACTGT TGGTTTCATC 10380 

GCTGCGATGT GCATTGACAA TTGTTTAGCA AGTGCTTCGT CTCCACCTTC AACAACTGAA 10440 

ATAACACCGA TACGTCCACC GTTATGTTGG TATGCTCCAA AGTGTTGTGC GTCTGTTTTT 10500 

TCAATCAATG CAAAGCGACG GAATGAGATT TTCTCTCCGA TAGTTGCTGT TGCAGATACG 1056 0 

TATGCAGCTT CAAGAGTTTC ACCTGAAGGC ATTATCAAAG CAAGAGCTTC TTCGTTGTTA 10620 

GCAGGTTTTC CTTCAGCAAT GACTTTAGCT GTAGTATTTA CCAATTCAAC GAATTGAGCG 10680 

TTTTTTGCAA CGAAGTCAGT TTCAGCGTTT ACTTCAATAA CTGCTGCAAC ATTACCGTTA 10740 

ACATAAACAC CAGTCAAACC TTCTGCAGCA ACACGGTCAG CTTTCTTAGC TGCCTTAGCC 10800 

ATACCTTTTT CACGAAGCAA TTCAATCGCT TTTTCGATGT CACCGTCTGT TTCTACAAGC 10860 

GCTTTTTTAG CGTCCATAAC ACCGGCACCA GATTTTTCAC GCAACTCTTT TACAAGTTTA 10920 

GCTGTAATTT CTGCCATTTT AATTCTCCTA TATTTTTTGA AAATAGGAGA GCGCGGCTAA 10980 

GCCCCGCCTC CGG 10993 
(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8411 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

CGACGGGGAG GTTTGGCACC TCGATGTCGG CTCGTCGCAT CCTGGGGCTG TAGTCGGTCC 60 

CAAGGGTTGG GCTGTTCGCC CATTAAAGCG GCACGCGAGC TGGGTTCAGA ACGTCGTGAG 12 0 

ACAGTTCGGT CCCTATCCGT CGCGGGCGTA GGAAATTTGA GAGGATCTGC TCCTAGTACG 180 

AGAGGACCAG AGTGGACTTA CCGCTGGTGT ACCAGTTGTC TTGCCAAAGG CATCGCTGGG 240 
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TAGCTATGTA GGGAAGGGAT AAACGCTGAA AGCATCTAAG TGTGAAACCC ACCTCAAGAT 3 00 

GAGATTTCCC ATGATTATAT ATCAGTAAGA GCCCTGAGAG AT GAT CAG G T AGATAGGTTA 3 60 

GAAGTGGAAG TGTGGCGACA CATGTAGCGG ACTAATACTA ATAGCTCGAG GACTTATCCA 420 

AAGTAACTGA GAATATGAAA GCGAACGGTT TTCTTAAATT GAATAGATAT TCAATTTTGA 4 80 

GTAGGTATTA CTCAGAGTTA AGTGACGATA GCCTAGGAGA TACACCTGTA CCCATGCCGA 54 0 

ACACAGAAGT TAAGCCCTAG AACGCCGGAA GTAGTTGGGG GTTGCCCCCT GTGAGATAGG 600 

GAAGTCGCTT AGCTTTAATC CGCCATAGCT CAGTTGGTAG TAGCGCATGA CTGTTAATCA 660 

TGATGTCGTA GGTTCGAGTC CTACTGGCGG AGTAAT t GAT AAAAGGGaAC ACAGCTGTGT 72 0 

TCCTCTTTTT GTATCAATTT GTATCACCAA GCATTTTCAT AAGGAAGTCT GTTATTTCTT 780 

GAGAACTTTC TTTTTTTCCA TGTGCAATCC AAGTTTGGCA GACACCAAAA AGTGCATGAG 84 0 

TTAGATAGAT GCTACTATAT TCTAATTCAG TGGTATTTAG ATTCAGTTGC ATAAATCGCT 900 

TTTGTAAATC TGTACTAAGC ATGATATGAA GTTTATTTCG TAAGAAATTT TGGATTTCTT 960 

TAGTCCCATT TTCAGAAAGA AGGGCAGCCA GAAGTGGTTC TGACTCTAGA TATTCAAAAA 102 0 

CTTCTAAAAT AGCGTCTCTT TTGTGATGAG CATGTTTTTG AAAAATATAT TCAAATGTAT 1080 

GGAATAGCTT GCTTTGATAG TGCTCAATCA TATCATACTT ATCCTTATAG TGAGTATAGA 1140 

AGCTGGAACG ACTAATTCCG GCTTTTTCTA CTAATTTGAC AGTAGAAATT TTATCAAATG 1200 

GCTGTTCCAT CAGTAATTGT ACCATAGCAT TTTCAATAGT TCGCTTTGTT TTTAAGCGTT 12 60 

TGTTACTTTC TTGCATATTT CCTCCTTGTA AACAAATTAG ACTATATGTC TAAAAATAGA 132 0 

TTTTTTATCT TGTAATTTAG ATTTTTTAAT GTATAATCTA TTATATCAAA ATTTTAGACA 13 8 0 

ATATGTTTAA AAAAGGAGAA ACTAAGTTTA AAGAATGGAA AGCAATTTAA AAAAAACCAA 14 40 

CCTTTATTAT TGTCATGATC GGGATTTCTC TTATTCCAGA TCTGTACAAT ATCATATTTT 15 00 

TGTCATCAAT GTGGGATCCA TATGGGCAAT TGTCTGACTT ACCTGTGGCA GTTGTAAATA 1560 

ATGATAAAGA GGCTTCCTAT AATGGTAATA CTATGGCAAT AGGAAAAGAC ATGGTGTCCA 162 0 

ATTTAAAAGA AAATAAAACC TTGGATTTTC ATTTTGTAGA TGAAGAGGAA GGAAAGAAGG 1680 

GATTGGAAGA TGGCGATTAC TATATGGTAG TGACTTTACC AAGTGATTTA TCTGAAAAAA 174 0 

CAACTACATT ATCCAATATT CAATCGACAG CAGCTTATCA ATCATTGACA AGTGAGCAAC 1800 

AAACTGAGAT AAGTGATTCT GTATCTCAAA ATTCAACTGA TAGTATTCAA TCGGCTCAGT 1860 

CAATTGTAGC TTTAGTACAA GATTTACAGG GAAGTTTAGA AAACTTACAA AATCAATCTT 192 0 

CTAATCTTTC GACTTTAAAA AATCAATCTA ATCAAGTATC ACCTATTACT TCTACTTCTT 1980 

TGATAGGATT GTCAAGTGGA TTAACAGAGA TACAAGGAGA TGTTACTAGC AAATTAGTTC 204 0 
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CTGCCAGTCA GTCGATTGCA TCAGGTGTAA ACGCATATAC TACAGGTGTT GATAAAGTTT 2100 

CTCAGGGCGC AAGTCAACTA AGTGAAAAAA ATGCCACCTT GACAGGTAGT TTGGATAAAC 2160 

TAGTTTCAGG CTCAAACACC TTGACACAAA AATCTTCTAG ATTGACAGCA GGAGTTGGTT 2 2 20 

AATTACAATC AGGATCTGGG CAATTAGCAG ACAAATCCAG TCAGTTACTT TCAGGTGCTT 22 80 

CTCCATTAGA GAATAGAGCT AATAAATTGG CAGATGGATC TGGGAAACTA GCAGAAGGTG 2 340 

GAACAAAGTT AACTTCTGGA TTGGAAGATT TACAGACAGG ACTTGCTTCT TTAGGACAAG 24 00 

GACTAGGTAA TGCTAGTGAT CAACTCAAAT CAGTATCAAC AGAATCTAAA AATGCAGAGA 24 60 

TTTTGTCAAA TCCACTCAAT CTTTCAAAAA CAGACAATGA TCAAGTTCCT GTAAATGGAA 2 520 

TCGCAATAGC TCCTTATATG ATATCAGTTG CTCTTTTTTT GCAGCAATAT CAACAAATAT 2 580 

GATATTTGCG AAATTGCCTT CAGGACGTCA TCCAGAGAGC CGTTGGGCTT GGTTGAAATC 2 64 0 

TTGAGCTGAA ATAAATGGTA TTATAGCTGT TTTGGCAGGA ATTTTGGTAT ATGGAGGAGT 27 00 

TCAGCTTATT GGTTTAACTG CTAATCATGA GATGAGAATA TTTATTCTCA TCATCCTAAC 2 7 60 

AAGTTTAGTA TTCATGTCTA TGGTGACCAC TTTAGCAACG TGGAATAGCC GTATAGGAGC 2 82 0 

TTTTTTCTCA CTTATTTTGC TTTTACTACA GTTAGCATCA AGTGCAGGTA CTTATCCACT 2 880 

TGCTTTGACA AATGATTTCT TTAGATCTAT TAATCCCTGG TTACCAATGA GCTATTCAGT 294 0 

TTCGGGATTA CGACAAACAA TCTCTATCAA CAAGTCATTT TCCTAGCTGT CATACTAGTT 3000 

CTATTTACTA GTTTAGGTAT GCTAGCCTAT CAACATAAGA AAATGGAAGA AGATTAAAAA 3060 

AATCGACCGA TTAACTGGTC GATTTTTTAT GCCTTAGATG ACTTTCGTCT GTGATTATAG 3120 

ATTCCAAATA GTAAGAGAGA AGTAAAGGAA CAGATTGCTC CAGTAATAAA ACCATTGGGA 3180 

ATGAAGGAAA GTGTAATAGT TCCTTTCCCC TTGGGAATGT CAACTTTCAT AAATCCAGTT 32 40 

TGAGCTTGTT TAATTTCTAT TTTCTTACCA TCTTGGTAGG CAGACCAACC TTTGTCATAA 33 00 

GGAATGGTGA AGAAAATAGA TGTATCTTGT TGGACATCAT ATGTAGCAAA AACCTTGTTT 33 6 0 

TTAGAAGTTG ATACTGTGAC AGGTTGTTCT TTAATTTTTT GAATTGCCTC GGTGAAAGTT 34 2 0 

TTGGTATCTA AACGATAGAA GGTAGGAGAT TCAAATGATA CTTGTGAATT TCCAGGGAAA 34 80 

CTAACATTGA TATTGAAAGT TTTTTTCTCT TTAGTATATC C TAG AT T AAA GAAGGAGAAG 3540 

ACATTATCAG TTGTAAAAGT CTTTTTTTCA CCATTTACAA GGATGTCAAC CTTCTTTTGT 3 600 

TTATCGTTAG AAAAGTGAAG GTTTATGAAA GAGAGATAAA CTTGGCTGTT TTCTGGAACT 3660 

TCAATTTGAT ACTGGATTGC TGCATCTTCA TTTGAAGAAC TTGTGACACT AATCAAATCA 372 0 

TTAGTATTTT CTATTTTTTC TGTTTTTTCA TAAGGTATTG GAGAAAAATA ATCAAAATTG 3780 
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ACGTTAGCAA GTTGATTTAA AAATGAGGCC TGATTATCCA AGGTATGTTC ATTGAACTTG 3 840 

ACATCATTGT AAACAGATTG ACTCGCAACT GCAATCGGAA GAGAGTATTG ATTTTCATAT 3 900 

AGGGTAAGAT TATCTTTTTG ATAGATATCT TTAAAGCCAT ACTTATCAAT AGGACTGTCT 3 960 

GAGATATTGT ACTGGATACC AAATAAACTA TCAGCCAAAA TACTATTATT TGCATATCGG 4 020 

AG AT T GAG AT TAGTCCCAGA GGATTTAAAA CCAAGTTTAT CTAAAGTAGA GCTTGATGAA 40 80 

CGATTTCGAA CAGATGAAAA TTGAGAGATT CCATTGTAGT TGAATTTCAT ACTGTCATTT 4140 

CCTGTCTGAG TTTGTAGTTT TTCAGTACGA GTAAATTGAT TTCCAATATA TGTTGAGAAA 42 00 

GATTCCATAG CTGGGATATC TCGACTATAA GCACTTCGAG AAGCAAATCC CCATTCCTTA 42 60 

GCAATTCCGT CCATTTGAGA TGAAGCATTT AAACTCATTT CAACCAGTAT AAATAAAGAG 43 2 0 

ATTAGAATGG CAAATAGATT CACAGATATA AACTTTTTGA TAACTGCAAG GAGTAAAAGA 4 3 80 

GAATAGACAA CCAAAAATTC AAGAGTAAGC AGAATATTCA AATCTGTTAA AAAAGAATAA 4440 

TGCGATTTTA GATAGATGGT AGCTAAAAAT CCTGCTACTA CAAGAAAAAG CGAAACTAAA 45 00 

AAATTCCAGA CTTTAAGTTC TTTCAGACGC TTTAAGACTT CTGCTGCTGT GTAAATTAAC 4 5 60 

AAGGTAGAGA AAATCCAAGC ATAGCGATGT AAAAACATGT TTGGAGTATG CATGCCTTGC 4 62 0 

CAAAATAAGT CAAGAGCTTC TATGTAAAAG CTTGCAATTA GAAATGCAAA GAATATTACA 4 6 80 

TATATGAGTT TCACGTGAAA CTTAATAGAT TTCAGCGTAA AAAATAAAAT GGTCAAAATA 4740 

AAGGGAAATA GTCCAACAAA AATCATTGGG ATGGCCCCAT ACTTTGTTGT GTCAAAGGAA 4800 

CCAATGAATT GCTTAGCAAA GAGATCAAGA TACCAGCTAC TTTCAGTTTG AAACTTTGTA 4860 

ACTTCAGTCA ATTTTTCCCC ATGTGTCTGT AAATCAAATA GAGTGGGAAG AGTCATAATC 4 920 

AAACTAGCCA TACCAGCTAA AAAGGAGATA ACTATGAAAT CAAGAACAGA TGATTTTCGA 4 980 

GTCTTAAAGT CCCACGAAAT TTGACAGAGA TACCAGAAAA TAAGAAACAA TACTGTCATA 5040 

TATCCAAAAT AATAATTTTG AATAAATAAG ATTGACAGAC TTGTAAAGTA CAATAGGAGT 5100 

TTCTTTTCAG TTATCAGTAG ATGTAAACCA GTTATAATTA AAGGAATCAA GATAAAAACA 5160 

TCTAGCCAGG TTTTTATCTC TAATTGACTG ACAGTGAAAC TCATCAGAGC ATAGGAAGTA 522 0 

GATAAGGCTA GTTTTAAAAT CTGAGGGATA GATTGAAACA ATTTATTCAA ACTAAAAAAG 528 0 

GTTGACAGAC CAATCAATCC AAATTTTAAG AGAGTTGTCA GAT AG AT AG C ATCTGGCATA 5340 

TTCGTTAGAT CAAAAAAGTA AACCAGAGGC GCGAGAAAAC TACCCAAGTA ATAACTAGAT 54 00 

AGGGCATAGA AGTTTAGCCC TAGACCACTT GTAAAGGTGT AAAACAGATT ACTATTTCCA 54 60 

TGTAGGATAT TTCGTAAGGC TACATCAAAA ATAACGTATT GATGAAAGCC ATCTCCTAAT 552 0 

AGAGGAGAGT TGTCGCTATT CCAGTAGATA CTTTGAGATA GATATACTCC AG AC ATAAT C 55 8 0 
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ACTACAGGAA TGATGAAAGA AATAAAATAG GTTCGATATG TTTTTAAAAA TGATTTCATG 5 64 0 

TTACCTCGTA GAATGATAGA AAACTCAGTT GGTTAACCCA ACTGAGTTTT GAAGTTTTAT 5700 

TTAGTCTTTC CAAAGTTCTT TAACTTTTGC TTGTACTTCT GCATTTTCTA GGAATTCATC 57 60 

GTAGGTTTCA TCGATACGGT CAATGACGCC ATTTTTAGAT AAGACAATGA TATGGTTAGC 582 0 

CAAAGTTTGA ATAAATTCGT GGTCATGGCT GGCAAAGATG ATTGATTCTT TAAAGTTTTT 5880 

CAATCCATCA TTCAAGCTTG AGATAGATTC CAAGTCCAAG TGATTTGTTG GATCATCAAG 5940 

TACAAGGACA TTTGATTTTA AGAGCATGAG TTTTGAAAGC ATGACACGAA CTTTTTCTCC 6000 

CCCTGACAAG ACATTTACAG GTTTGTTAAC TTCATCTCCA GAGAAGAGCA TACGGCCGAG 60 60 

GAAGCCACGT AGGAAAGTAT TGTCATCTTC TTCTTTACTT GCGAATTGAC GCAACCAGTC 6120 

AAGAATTGAT TCTCCTCCTG CAAAATCAGC TGAGTTATCT TTTGGTAGGT AAGATTGACT 618 0 

AGTTGTAACT CCCCACTTGA CAGTTCCTTC ATAGTCAATA TCTCCCATGA TTGCACGAAT 62 4 0 

TAATGCAGTC GTTTGAATAT CATTTTGTCC AATAAGTGCT GTCTTATCAT CTGGACGCAA 63 00 

GATGAAACTA ATATTATCCA AGATAGTTTC ACCATCAATC TTTACAGTTA AATTTTCTAC 63 60 

TGTCAAGAGA TCATTACCAA TCTCACGTTC CGCTTTAAAG TTGATAAATG GATATTTACG 642 0 

ACTAGATGGC ACAATCTCTT CTAGCTCAAT CTTATCAAGC ATTCTCTTAC GTGATGTTGC 64 8 0 

CTGCCTTGAC TTAGAAGCAT TGGCAGAGAA ACGAGCAACA AATTCTTGCA ATTGTTTAAT 6540 

TTTTTCTTCT GCTTTAGCAT TACGGTCTGC TAGCAATTTA GCAGCAAGCT CAGAAGATTC 6600 

CTTCCAGAAG TCGTAGTTTC CGACATAGAG TTTGATTTTT CCAAAGTCAA GGTCGGCCAT 666 0 

GTGAGTACAA ACTTTGTTTA AGAAGTGACG GTCGTGGGAT ACTACGATAA CTGTGTTATC 6720 

AAAGTCAATC AAGAAGTCTT CTAACCAAGT AATCGATTGG ATATCCAAAC CGTTAGTAGG 678 0 

CTCGTCCAAG AGAAGAACAT CTGGTTTACC AAAAAGTGCT TTGGCGAGGA GAACCTTTAC 684 0 

TTTTTCACCG TTGGCCAATT CGCTCATGTT TTGGTAGTGT AATTCTTCTG GAATGTTTAG 6900 

GTTTTGAAGT AGTTGAGAGG CTTCACTCTC TGCTTCCCAA CCTCCAAGTT CGGCAAACTC 6960 

TCCTTCGAGT TCGGCAGCAC GAACCCCGTC CTCGTCTGAG AAATCTTCCT TCATGTAGAT 7020 

AGCATCTTTC TCTTTCATGA TGCTATAAAG TTTTTCATTT CCCATGATAA CGACATCAAT 7 080 

GGCACGTTCA TCTTCGTAGT CAAAGTGATT TTGACGAAGA ACAGAGAGAC GTTCATCTGG 7140 

AC C AAGAG AG ATGTGACCAG TAGTAGGTTC GATATCTCCA GCTAAAATTT TTAAAAAGGT 7200 

TGATTTTCCG GCACCATTAG CACCGATTAA TCCGTAAGTA TTTCCTTCTG TAAATTTGAT 7260 

ATTGACATCA TCAAAAAGTT TGCGATCACT AAAACGTAGT GAAACATCAG ATACTGTAAG 7320 
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CAATGTTTTT CTCCTATATG TGTAATATAT TTATTCTACT AGAAAATACA GAAATATTCA 73 80 

AATTTTTATT TGTCAATTTT GTGTAAATTA TATTTACAGT ATCCTTTACA CAAATCTGTA 7440 

AAAAGCAAGG CTGATTTATT TTGATAAATT ACGGTTATTT CATTAAAAAA ATGCTATAAT 7500 

TGAAAGGACT ATATCGAAGG AGAACAAAAT GACTAAACCC ATTATTTTAA CAGGAGACCG 75 60 

TCCAACAGGA AAATTGCATA TTGGACATTA TGTTGGAAGT CTCAAAAATC GAGTATTATT 762 0 

ACAGGAAGAG GATAAGTATG ATATGTTTGT GTTCTTGGCT GACCAACAAG CCTTGACAGA 768 0 

TCATGCCAAA GATCCTCAAA CCATTGTAGA GTCTATCGGA AATGTGGCTT TGGATTATCT 7740 

TGCAGTTGGA TTGGATCCAA ATAAGTCAAC TATTTTTATT CAAAGCCAGA TTCCAGAGTT 7800 

GGCTGAGTTG TCTATGTATT ATATGAATCT AGTTTCGTTA GCACGTTTGG AGCGAAATCC 78 60 

AACAGTCAAG ACAGAGATTT CTCAGAAAGG ATTTGGAGAA AGCATTCCGA CAGGATTCTT 7920 

GGTCTATCCA ATCGCTCAAG CAGCTGATAT CACAGCTTTC AAGGCTAATT ATGTTCCTGT 7980 

TGGGACAGAT CAGAAACCAA TGATTGAGCA AACTCGTGAA ATTGTTCGTT CTTTTAACAA 804 0 

TGCATATAAC TGTGATGTCT TGGTAGAGCC GGAAGGTATT TATCCAGAAA ATGAGAGAGC 8100 

AGGGCGTTTG CCTGGTTTAG ATGGAAATGC TAAAATGTCT AAATCACTAA ATAATGGTAT 8160 

TTATTTAGCT GATGATGCGG ATACTTTGCG TAAAAAAGTA ATGAGTATGT ATACAGATCC 822 0 

AGATCATATC CGCGTTGAGG ATCCAGGTAA GATTGAGGGA AATATGGTTT TCCATTATCT 82 80 

AGATGTTTTT GGTCGTCCAG AAGATGCTCA AGAAATTGCT GATATGAAAG AACGTTATCA 8340 

ACGAGGTGGT CTTGGTGATG TGAAGACCAA GCGTTATCTA CTTGAAATAT TAGAACGTGA 8400 

ACTGGGTCCG G 8411 
(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 9064 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

TGCCGTACTC AAGTACAGCC TGCGCTAAGT TTCCTAGTTT GCTCTTTGAT TTTCATTGAG 60 

TATTAGTAAC CAAAATCCGA CCACATAGCC AGCCCCTATG AATATAGCCA TTAAAGCTAG 120 

CATGGAATTT AGGAAATTAA AAACCACCGC AGATACAAAG GTTAGCACAA AAACATTAAA 180 

AGCAATGGTG TCAGAAGCCA AGACTAGAAT ATAGGGTGTC AACCGATCTA AAGTTTTGGA 2 40 

ATCTAGGAAA AATAAGTGTT TATACATGAT GACCTCCTCT ATGGCTGAAA AGCAAGCCTT 3 00 
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TTGTTTTTTT ACCCCAAGAC CCTATGTAGA AAAGTGAGCA AAAACGGGAA GGTCGCTACA 3 60 

ATATTATTGA TCACATGCAC CGCATAGGAT GGATAAATGC TCTTGGTATA GCGGGTCAAA 42 0 

CCAGCAAAGA TGATTCCAAC TGTTGCAAAG ACGAAGATAT CTAACAGACT AGGCAGGCTT 480 

GAAAAATGAG GGAGAGCAAA TAAAATAGAA GGAAGAAGCA AATCAAGACC AAATCGCGAA 540 

TGCTTAAAGA AAGCATGTTG CAGTAATCCT CTATAAATCA ATTCTTCCAT CAGTGGAACC 600 

AGAAAGAACA GGGCTATATA AATACCTAGC TCTGCAAAGT TAGTCCCACT ATAACCAATC 660 

AATACAGCCC AACCTTCCGC AGTTGACTGA ACATGTTTAG CTGTCTGAAC GTTAAAAGAG 72 0 

ATCTGGAACA CTAGCACTAA TACTGTCAAA ATCGAATACC AAAGCCATTT TTTTCTTGGA 7 80 

ATGCGGAAGA GATAACCATG GCCTGTCTTA ACAAGAACCA CAATCATGAC TCCAATAAAA 84 0 

AGTAAACTCA AGATATTTTG AATCCAGAAT AAATTGCCTA TCTGAGAAGA AAATTGCCAA 9 00 

TAGTTTTGGA CGATAAGCGT CAGCTGAGAA AGACTAAATA CGAAAAATAA GTAAGAGAAG 9 60 

ACTGCACTTA TTTTGAATAG AAGTTGATAC TTTTTCATAG AAATCCTCCC TACTATGACC 102 0 

TCACCTTGTC AGGCTCTACT GCTGTAAGAT TAAGAAGACA GTTTGTTTTT TTTAAGGCTA 1080 

ACCTGACTAC TAGATAATAG ATACATTAAG GCATTAAAGA CAATGAAAAT ATGTCCATAG 1140 

AATAAAATCA ACCTCGCATC CAAACCAAGA TAAAGTTTGA TTATCAAAAA GATGAGCAAA 1200 

AGAATTTGAA ACCATAAGGT TTTTCCAAAA ATAAATTTAA AGCGATTTCG AATATCTACT 12 60 

TCCTTGATTT TTACCGCCAC CCCTTTATTA GCAAGAAGGA AAACTCCTGC TTCAAACAAA 132 0 

CCACTGTAAA GAACAAGCCA CCCAATAGAT ACGATAGAGA TTTGTAAAAA TGTCCCTAAA 1380 

AGAATATCCA ACACACTACT CAAGAAAATA ACAAAAAATA ATCTGTATTT CATATTAAAT 1440 

ACCTCCATTC ATTTATTTCA CTAACAATTT AATAGAGCCT TCTACTCAAA TATCCTGTCA 15 00 

GAAAAGGATA GAAAGCTACT TTTTATAATA CTTCAAGCCC CACATGAGCA GAAGCGTGAT 15 6 0 

AAACAAGCAG AGAATACACC TATATAAGCG ATTAGTTGTT GATAGAATTC TGTTTCTGAA 162 0 

ATACCTCTAT ACAAACAAAT GACAAACATA AAATCTGCCA AGCCGATAAA CATAAGTTGA 168 0 

TTGGTTCTAG GACTAACCAA ATCATCATTT ACTTATATTT AAGAGTATCT CTTTTATTTT 1740 

AATGTATGTT AGCACTGAAA AGCAAGACAG GCCAATAATA TTTAAAATGA ACAGTAACGG 1800 

GGTTAAGTCT CTAAAAAAAT TATCTACTGA CACTACAAGA AATACTATAC ATATTATAGT 1860 

CGAAACTATC TTTTTCTTAT CCATAATTAT TTACTCCTTT CCTAACAAAT CCAGCTTATC 192 0 

AATCAAGAGC GATTTTTAAC ATAATGTAGC AGCACCCGTT GCAACTTTGA CAAGTTTAGT 1980 

ATATCATTGT TTTTTAAAAT TTTTCATCCA AATCTTGAAT TGTCATCGAA ACATCTTGAA 204 0 
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TTGTTAAAAA ATTTAAAAAG TAAGCATTAA AAACATACTT TCCTCTTTAT ATTGTATTGA 2100 

TACCAACTTG TTTGTAGACT TTTCATCCTG CTATCACATA TCATTTTGAC AGGCGAAACA 2160 

ATATTAAAGA AACTCCCCTG TAAATTAAGC TAGCAAATAC AGGGGAGAAA TTTATTTTTT 2 22 0 

AGAGAGTACT ATCCGTATCC TTTTTGGAAG ATTTTGAAAA TATTTTTCTA ATTAAGTCAT 2280 

CCATATAAGG ACCAAATATA CCAACTACTA AACCAATAAT AAAACTTTTA AAATCCATAA 2 340 

TTACCACCAA CATATTGCTG CATAGGCTAC ACCTCCAAGT AT AGCT C C AC CTGCAGCACC 2400 

AGTTACACCT ATTCCTATAG CAAATGGTCC CAATAGAAAT GTCAAACCGT TGTTGCACAC 2 460 

CCATCAATTG CGCCATATGC AACCCCTGCT GCACAACTAA TTTTTCTTCC CCAATCAATA 2 520 

TCTCCACCTT CAACGCAAGC AAGCATTTCA TTATCCATAA CTGCAAATTG TGACATCATT 2 580 

TTTGTATCCA TATAGTGTAT CACTTTTCAG TTACGGAACA AGTTTAATAT AAAAATTATC 2 640 

AAAAAAACAT AGGCAATAAA GAGAAAAATT AATTTATCAT AGATTAGAAA TAATATGACA 2700 

AAACAATTCA ATGATGTTAA TTCAATAGTC TTTTGTTTTT TATCGGAGAT ACTTATGGAT 27 60 

AGATAAATAA GATAGGTTTG AAAAGCGAAG AGAATAATAA AGAATATAGC CTTCATAAAA 2 82 0 

TTTAGCTTTC ATTTTTATGA TGTAGCGGTA TAGGCTAAAT ATCCACAAAC CACTGCTCCT 2880 

CCAATTCCTC CTATTGCAGC GCCCCATGGT CCTAGAAGTC TCCCATATTT CACTCCACCC 2 9 40 

GCTGCACAAC CTAAAGCAGC AACTACAGCT GCTCCTCCGG AATTACCTCC ATAAACCTCA 3 000 

CTCAGCATTG TTTCATTTAT ATTACAATAA GTATTCATAC AAGTCTCCTT TTATTAAAAT 3060 

CCACCCGTTG CCCCTGTTAC TCCTGCCCAA AGATCCACAC CAAATTTAGC TCCTATGTAT 3120 

CCACATGCTC CCATAAATGG TGCTCCAACA CCACTCGCAG CACAAATAGC TGTCCCTAGC 3180 

CCCCAGCCAC CAAAAGCAGC ACCACCACCT TCTAAGACAT TAGTTTGCCA ATTATTCTTG 3240 

CCTCCTTCAA TACTAGATAA CATAGTTATA TCCATTTCAT GAAATTGTTC CATAATTTTT 3 3 00 

GTATCCATGA CAAATACTCT TTTTTATTTT TAATTTTTGT CTTGTTGTAA CTTTGACAAG 33 60 

TTTAGTATAT CATCGTTTTT TAAAATTTTT CATCCAGATT TTGAATAGTC ATCGAAACGT 3 420 

CTTGAATTGC AAAAATTACA TTAGACTTCC TGCAAAACTA GAATCCTAGT TCATGATTGA 3480 

TAATACCAGC ACTCAAATTC ATTCGTAATC CGAAGCGTTT ACGATGACTT CGATAGGTTG 3 540 

TTGAAAACAT TTTAAACGTT TTTACTTTGG CAAAGATGTT CTCAACCTTG CTTCTCTCCT 3 6 00 

TAGATAGCGC ATGGTTACAG GCTTTATCTT CAACTGTTAG CGGTTTGAGT TTGCTGGATT 3 650 

TACGTGAAGT TTGTGCTTGA GGATATATCT TCATGAGCCC TTGATAACCA CTGTCAGCCA 372 0 

AGATTTTACC AGCTTGTCCG ATATTTCTGC GACTCATTTT GAACAACTTC ATATCATGAC 3780 

AATAGTTCAC AGTGATATCC AAAGAAACAA TTCTCCCTTG ACTTGTGACA ATCGCTTGAG 3 84 0 
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TCTTCATAGC GTGAAATTTC TTTTTACCAG AATCATTCGC TAATTCTTTT TTTAGGGCGA 3 900 

TTGATTTTTA CTTCCGTCGC ATCAATCATT ACCGTGTCCT CAGAACTGAG AGGAGTTCTT 3960 

GAAATCGTAA CACCACTTTG AACAAGAGTT ACTTCAACCC ATTGGCTCCG ACGGAGTAAG 402 0 

TTGCTTTCGT GAACACCAAA ATCAGCCGCA ATTTCTTCAT AAGTGCGGTA TTCTCGCACA 4080 

TATTGAAGAG TGGCCATAAG AAGGTCTTCT AGGCTTAATT TAGGTTTTCG TCCACCTTTT 414 0 

GCGTGTTTAA GTTGATAAGC TGTTTTTAAT ACAGCTAGCA TCTCTTCAAA AGTCGTGCGC 4200 

TGAACACCAA CAAGACGCTT AAATCGTGCA TCAGTTAGTT GTTTACTTGC TTCATAATTC 4260 

AT AG AAC TAT AGTAAAATGA AATAAGAACA GGATAAATCG ATCAGGACAG TCAAATCGAT 4 320 

TTCTAACAAT GTTTTAGAAG TAGAGGCGTA CTATTCTAGT TTCAATCTAC TATACTATAC 4 3 80 

CATATTTTGT TTCGCAGGGA ATCTATTATA AAAGGGTAAG TATTGCAAAA ACACTTACCC 4 440 

TTTTCTTTTA TACTTCATTA AGCTCTACTT TTTATAATAC TTCAAGCCCC ACATGAGCAG 4500 

AAGCATGATG ATTAAGCAGA GAACAGCGCC AATATAAGCG ATTATTTGTT GGTAGGATTC 4 560 

TCCTGCTGTG ATACCTCTAT ACAAACAAAT AATAGACATA AAACCTGTCA AGCCGATGAA 4 620 

CATAAGTTGA TTGGTTCTAG GACTAACCAA ATCATCATCT TCAAACTCTC TTATCCTCAT 4 680 

TTCCCTAGTG AGATAAACAG TAACCAAAAT AGAAGCCAAG TTAATAACTA CTAAAAGAAA 4 740 

TTGGAAAACT ACGGAAAAAT TTAAAAACTG ACGAGATAGA AATAGATAAG TAGAAACAAG 4800 

CAAGGGCAAC TGACCTAAGA ACAATCTCGC AAGGAAGATG TTCCGTTTTT TAGCAAGAAA 4860 

AGTTTTCATT TCTTTTCTCC TTTCTTTTTA TTGATAGCAA AATAGATCAT AACTGCAATC 4 920 

ACATAGGCTA TGGTATAAAA TAGCTGATAC CAAGCACTCT CCCTAAGCGG ATATAGAAAG 4 980 

ATGGACATGA TTAGATACAG AACGAAAATA ATCAGTATTT TTTTCTTCAT AAGATTTCCT 5040 

CCTAAATGTG CGATTTATCT TAGTTGAGCA AGAACATTTA CACTGCTAGT ATAGCACTTA 5100 

TTTTGACCTT GGATCACTCA AATCATAAAT GGTCATCAAA ACCTCTTGAA TTGTAAAAAT 5160 

TAAAAAAGCA AGCATGAAAA ACATACTTTC CTCTTTATAT TGTATTGATA CCAACTTGTT 52 20 

TGTAGACTTT TCATCCTGCT ATCACATATC ATTTTGACAG GCGAAACAAT AT T AAAG AAA 5280 

CTCCCCTGTA AATTAAGCTA GCAAATACAG GGGAGAAATT TATTTTTTAG AGAGTACTAT 53 40 

CCGTATCCTT TTTGGAAGAT TTTGAAAATA TTTTTCTAAT TAAGTCATCC ATATAAGGAC 5400 

CAAATATACC AACTACTAAA CCAATAATAA AACTTTTAAA ATCCATAATT ACCACCAACA 54 60 

TGTTGCTGCA TAGGCTACAC CTCCAAGTAT AGCTCCACCC GCAGCACCAG TTGCTGCACC 5520 

TTGCCATGTT CCTGTTTTAA TGCCTAGTTG AAGACCTCTT GCTGCTCCTC CTCCAACACC 5580 
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TGCTTTGGCA AAATCTCCCC AATTGCATCC GCCACCTTCA ACGCAAGCAA GCATTTCAGT 5640 

ATCCATAACA GAAAATTGTG ACATCATTTT TGTATCCATG ACAAATACTC CTTTTTTAAA 5700 

AAACTAAAAT AAATCAGAAT AGAATCCTCA TAATTTTACT ATAAGTCTTA CCAACTTAGT 57 50 

CCCAATTTAT CACCAACCAT ACCTCCTAAG CATGTTAATC CACCCCCAAT TGCACCAATG 5 820 

TGTGCTCCAA CAAATGCACC AGCAAGTCCA GCTACTCCTA AAGTGGCCAA ACCTGCTCCA 5 8 80 

GTTCCACCAG TTATAATTCC CGTAGTGACT CCTGTAATCA GTGCATTTTG ACAATCAGTG 5940 

GAGCTATACC CCCCTTCAAC TTTCGCAAGC ATTTCAGTAT CCATAACCTC TAACTGTGAC 6000 

AACATTTTTG TATTCATGAT GAATACCTCC TTTTTATTTT CAATTTGTTA CCAAAGTCTT 6060 

AAATTCAATA AACAAATAGA TTTTTTATAG TATCTTTTTG ATTTTCTTAA AAAAGTATAT 612 0 

ACGTCTACTA TCTTCTTAAA GGTAGCAGTA CCTATTTTTT AGTCTAAGAT TTCAATAATC 6180 

TTGAGTATCT AAAATATCTT AATTTCGTTA TTCTCCTTGC AATAAAAAGT TTTACTATAC 62 4 0 

TATTTATTAA CTTGCAGAAA GCAAAAAATA TTAGTAAATA ATAGTTTATA GTTAAGTTTT 630 0 

TTATTCCTAC CAATCCATCA ACTAAGTAAA GCATCAACGA TTACATAAAC GATTGATAAT 6 3 60 

ATAATTAAAA TTTTGCTAAC TATCTTATTC TCATCATTCT TAGATAACTT TGATATTTTG 642 0 

TAAGTAAGTA AATAAGACAG TAAATTAATA GCGATAATAA TACTATATTT AAGAATCATA 64 8 0 

ATCTTACAAA GAGGACATAA TTCCTGAACC TACACAAATA AGTGTTGCTG CTCCCCCAGT 6 54 0 

TATCGGACCA GTCGCAGCAG CTAATAGTAC TGCTCCAATA CAACCACCGA TTGCAGATCC 6600 

TAAATTGCCT CTTCCTCCAC TAACTATTTC GAGTTCTTCA TTATCCATAA CAGAAAATTG 6660 

TTCCATCATT TTTGTATTCA TGACAAATAC TCCTTTTTTC TTTTTTTATT TTTGTCTTGT 67 2 0 

TGTAACTTTG ATAAGTTTAG TATATCATCG TTTTTTAAAA TTTTTCATCC AGATCTTGAA 67 80 

TTGTCATCGA AACGTCTTGA ATTAGCTTTT TTATTTCAAG CCACCTCTAA ATGTTTAAAA 684 0 

AAAATAATTT CTAATCACTT TTTTACCATT CAGGAAGTTT TAATGACTAT TCAAGATTTC 69 0 0 

ATAAAATATG AACTTAGTTT TATGACATAA TAGACCTATC CACTATATGA AAGGAATTGC 69 60 

CAATGACTTC TTATAAACGT ACATTTGTTC CTCAAATAGA TGCGAGAGAC TGTGGTGTCG 702 0 

CTGCCTTAGC CTCGATTGCT AAATTCTATG GTTCAGATTT TTCTCTAGCT CACTTGAGAG 7080 

AACTTGCAAA GACCAATAAA GAAGGGACGA CTGCTCTTGG CATTGTAAAA GCCGCTGATG 714 0 

AAATGGGCTT TGAAACAAGA CCTGTTCAAG CAGATAAAAC GCTCTTTGAC ATGAGTGATG 72 00 

TCCCCTATCC ATTTATCGTT CACGTTAACA AAGAAGGAAA ACTCCAACAT TACTATGTTG 72 60 

TCTATCAAAC AAAGAAAGAC TATCTGATTA TTGGTGATCC TGACCCTTCT GTAAAAATCA 732 0 

CTAAAATGTC AAAAGAACGC TTTTTCTATG AATGGACTGG AGTAGCTATT TTTCTAGCTA 7380 
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CCAAACCCAG CTATCAACCC CATAAAGATA AAAAGAATGG TCTACTAAGC AAGCTTCCTT 744 0 

CCTCTGATTT TCAAACAAAA ATCTCTCATT GCTTACATTG TTCTCTCAAG CTTATTGGTC 7500 

ACTATTATCA ATATAGGTGG TTCTTACTAT CTCCAAGGAA TCTTGGATGA ATACATTCCA 7560 

AATCAGATGA AATCAACTTT AGGAATCATC TCAGTTGGTC TGGTTATCAC CTATATCCTC 762 0 

CAACAAGTCA TGAGCTTCTC CAGAGATTAT CTCCTAACCG TTCTGAGTCA GAGATTAAGT 7 680 

ATTGATGTGA TTTTATCCTA TATTCGCCAT ATTTTTGAAC TTCCCATGTC TTTCTTTGCG 7740 

ACACGTCGTA CAGGAGAAAT CATTTCACGA TTCACAGATG CTAACTCTAT TATAGATGCC 7 800 

TTGGCTTCTA CCATTCTTTC TCTTTTTCTG GATGTTTCTA TTCTGATTCT TGTAGGAGGC 7 860 

GTCTTACTGG CACAAAACCC TAATCTCTTC CTTCTTTCTC TTATTTCCAT TCCTATATAC 7 92 0 

ATGTTCATCA TCTTTTCTTT TATGAAACCT TTCGAAAAAA TGAACCATGA TGTCATGCAA 7 980 

AGTAATTCTA TGGTTAGCTC TGCCATTATC GAAGATATCA ACGGGATTGA AACTATAAAG 8 040 

TCGCTCACGA GTGAAGAAAA TCGCTATCAA AATATAGACA GCGAATTTGT AGATTATTTG 8100 

GAAAAATCCT TTAAGCTCAG TAAATATTCT ATTTTACAAA CGAGTTTAAA GCAGGGAACA 8160 

AAATTAGTTC TGAATATCCT TATCCTATGG TTTGGCGCTC AATTAGTCAT GTCAAGTAAA 82 2 0 

ATTTCTATCG GTCAGCTGAT TACCTTTAAC ACACTTTTTT CTTACTTTAC AACTCCTATG 82 80 

GAAAATATTA TCAACCTCCA AACCAAACTC CAATCTGCGA AGGTCGCTAA TAACCGTTTG 8340 

AACGAACTCT ATCTAGTCGA ATCTGAATTT CAAGTTCAAG AAAACCCTGT TCATTCACAT 84 00 

TTTTTGATGG GCGATATTGA ATTTGATGAC CTTTCTTATA AGTATGGTTT TGGATGAGAT 8460 

ACCTTAACAG ATATTAATCT CACGATTAAA CAAGGAGATA AGGTTAGCCT AGTTGGAGTT 8520 

AGTGGTTCTG GTAAAACAAC TTTAGCCAAA ATGATTGTCA ATTTCTTTGA ACCCTACAAA 8580 

GGGCATATTT CCATCAATCA TCAGGATATT AAAAACATTG ATAAAAAAGT CTTGCGCCGT 8640 

CATATTAATT ACCTACCCCA ACAAGCCTAT ATCTTTAATG GCTCTATTTT GGAAAACTTA 87 00 

ACCTTGGGCG GTAATCATAT GATTAGTCAA GAAGATATTC TAAAAGCTTG TGAAGTAGCT 87 60 

GAAATCCGTC AAGACATTGA AAGAATGCCT ATGGGCTATC AAACTCAGCT CTCTGATGGA 8820 

GCTGGTCTAT CAGGAGGACA GAAGCAACGA ATCGCTCTCG CTCGTGCTCT TTTAACTAAA 8880 

TCTCCTGTTT TAATACTAGA TGAAGCTACT AGCGGTCTTG ATGTCTTGAC TGAGAAAAAG 894 0 

GTTATAGATA ATCTTATGTC TCTAACTGAT AAAACCATTC TCTTTGTAGC CCATCGTCTC 9 000 

AGTATAGCCG AACGAACCAA CCGTGTCATT GTTCTTGACC AGGGGAAAAT CATTGAAGTT 9060 

GGTA g 064 



WO 98/18931 



PCT/US97/19588 



250 

(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7780 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

CTCCATTTTT TTGATTTCAT AAATAAACAA CCTCTCTGTT AATTTTGTAT AATTATAACG 60 

ATATCCAAGT TACTTGTCAA GTGTTTTTTA AATTTTTATC TCAAAAATAT TTTTTCGTTC 12 0 

AAAAAAAGGA GCCATCAGTT GATTTCAAGC TCCCTTTTAT ACAGAATTAA ACTATTTTAT 18 0 

AGTTCGACAA TCTTACCTGT TTCAAAGTAG ACAACCCATT CACAGATATT TTTAGCATAG 240 

TCACCGATAC GCTCCAAGTA GGAAATAACT TGGAAATAAT CACGACCCGT AACAATGGCT 300 

TCTGGATTTT TCTTAATCTC TTCAGTCGCA AGGTCACGGA TAGTTTCAAA ATAGTGGTTA 3 60 

ATTTGCTCAT CCATGGAGGC CACCCGGTAT GCGTCGTCAA CAGAACCATT AAGATAAAGA 42 0 

TCAAGTGCTG CTTCCACAAC GCTTTTAACT TCACGTCCCA TTTTTTTAAT TTCTTCCTCT 480 

ACAGCTGGAA TGCGCTCTTC CCCCTTCATA CGGATGGTTG CCTGGGCAAT GGCTACAGCG 540 

TGATCCCCCA TACGCTCCAC ATCTGATACA GCCTTAAGGA CAGTCAAGAC TGTACGCAAA 600 

TCTTGAGAGA CTGGTTGTTG GAGTGCGATC ATTTCAAATG ATTTCTTTTC CAGTTTCACT 660 

TCGTATTCAT TTACTTCTGC ATCATCTTCG ATGACCTCTT TTGCCAGGTC ACGGTCATGC 72 0 

GTGACAAAAG CACGTACCGT ACGATTGATT TGTGAGAGCA CTTCTTGTCC CATAGCGTAG 780 

AACTGGTTAT GTAATTTCTC TAAATCTTCT TCAAATTGAG ATCGTAACAT CTTTCATCTC 84 0 

CTTATCCAAA TTTTCCTGTA ATATAGTCTT CCGTTTCCTT GTGTTGGGGA TCAAGGAACA 90 0 

TCTGCTTGGT ATCATTAAAT TCAATCAAAT CTCCATCTAG GAAAAATCCT GTCTTATCAG 960 

AGATACGTGA AGCTTGCTGC ATGGAACGGG TTACCAGAAG CATGGTGTAC TTGTCTTTTA 1020 

GACCATACAA GGTTTCCTCA ATTTTACCAG CTGAAATCGG ATCCAAAGCC GAAGTTGGCT 1080 

CATCCAAGAG GATGATTTTA GGACTAGTTG CCAAGACACG GGCCACGCAG ACACGCTGCT 1140 

GTTGACCACC TGACAATCCA ATAGCTGAAT CATATAGACG ATCCTTGACC TCATCCCAGA 1200 

TAGAGGCACC TTGCAAGGCT TTTTCTACGG CTTCATCCAG AACCTGCTTA TCCTTAATTC 12 60 

CATTGATACG AAGCCCGTAG ACAACATTCT CATAGATAGT CATAGGGAAA GGATTAGGTT 1320 

GTTGGAAAAC CATTCCGATT TCCTTACGTA ATTCAACCGT ATCTGTACGC GGACTGTAGA 1380 

TGTTGTGACC ATTGTACACC ACGGATCCAG TTGTGGTCAC CTCTGGATTG AGATCTCCCA 1440 
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TGCGGTTGAG AGACTTGAGG 
TAATTTCCTT AGGTTGGAAA 
AAACGGACAG GTCTGATACC 
AGTGACCAGA TACATAGTCA 
TCTTGTCATA CTCAATCAAA 
CAGCCTGCTG CATATTATGC 
TGGTCTCTTC TAGTTGCATG 
AGAGGATATC TGGCTTAACA 
CTGATAAGGT CAAGGCTGAC 
GACGAAGGGA GGTTTCTACG 
CATGCGCAAA GGTAATATTA 
CCATTCCAAT GTGTTTACGC 
CACGATAGAG AATCTGCCCA 
GACTGCGTAA GTAGGTAGAT 
TTCTTTCAAA TTGCATATCA 
CATCCTTAGT AGAAAGGGCT 
AGTTATATGT TGACATGGCT 
CCGAACTTAC GAGCTCCAAA 
CCTGCTGATA CAATGGTTCC 
ACAGCCAAGG TTTCTGCTTG 
C AGT TAG AC C AGTCAAGAGC 
TCGCCAAAGA TACGACCAGA 
GGAATAACAA CATGAACCAC 
CGTTGGGTAT GGTGAACGTG 
TTAAAGACTG TCAAGGCCAA 
ACAAAGATCA AGTAACCAAA 
ATACAAGTCC GCACAAAGTT 
CCAGCTCCCA TAGAAAGAGG 
TTGTAAAGCT GAATGCCAAT 
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AGGGTTGACT TCCCTGATCC AGATGGACCA ATCAAGGCTG 1500 

GATAGGGAAA CACTATTCAA AGCCTTCTTT TTATTATAAT 15 6 0 

TGTAAAATCG CATCTGTCAT ACGGTTTCCT TTCTAACCAA 1620 

TTGGTGGACT GTAGCTTGGC ATTTTGGAAA ATAGTTGCAG 1680 

TCACCCAAGT AAAAGAAGCC TGTATAGTCA CTTGCACGAG 1740 

GTTACAATGA TGATGGTAAA GTTTTTCTTG AGCTCAAACA 1800 

GTCGCAATCG GATCCAAGGC TGAGGCTGGC TCATCCATTA 18 60 

GAGATGGCAC GAGCGATACA GAGACGTTGT TGCTGACCAC 192 0 

TTGTGGAGAT CGTCTTTAAC CTGATCCCAG AGGGCAGCCT 1980 

ATTTCATCTA GGACTTGCTT ATCCTTAACT CCAGCACGTT 204 0 

CGGTAAATTG ACTTAGCAAA TGGATTGGGA CGTTGAAAAA 2100 

ATTTCATAAA CGTTGATTTC TGGACGGTTG ACATCAATTC 2160 

GTTACTTTAG CAATATCAAT AGTATCATTC ATGCGATTGA 222 0 

TTCCCCGATC CCGACGGGCC AATCAAAGCT GTAATTTTAT 22 8 0 

ATCCCCTTAA TGGATTCATT TTTACCATAG TAAACATGGA 23 4 0 

ACTTTTTCTT CAGGAAAGGT AAGGATATGC TTCTCATCCC 2400 

TCTCCTTTAG GCAGCGGTTA ATTTCTTGTG TAGATAGCTT 24 60 

GTTAAAAATC AGGATAAAGA TCAGGAGCAC AGCGGCAGAA 2520 

ATCTGGAATA GTGCCTTCAC TATTGACTTT CCAGATATGG 2580 

ACGGAAGATA GAGATGGGGC TAGTCACACT GAGGATATTC 2 64 0 

TGGCGCCGAT TGCCCTGCTG TATAGATCAG AGCTGCAGCT 2700 

TGCCAAGACG ACACCCGTTA CAATACCTGG AAGCGCTTCC 27 60 

TGTCTCCCAG CGAGAAATCC CAAGAGCCAG ACCAGCCTCA 2820 

TTTCAAACTA TCCTCTACAT TACGCGTCAT CTGAGGCAAG 28 80 

GGCACCTGAA ATGATTGAAA ATCCATACTC AAACTGGACT 2940 

GAGACCCACC ACCACTGATG GTAAAGAGGA CAAAATTTCA 3000 

GGTAACAGGA CCTTTTTTAG CATATTCAGC CAAGTAAATC 30 60 

TACAGAAATA ATCAAGGTAA TGACCAATAG GAAAAAGGAA 3120 

CCCACCACCT GCTTGAAAAG CAGAAGACCT TCCAGTCAAG 3180 
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AAAGACCAAG AGATATGGGG CAAGCCCCGA ACCAAGATAT AGAGAATCAA GGAAGCCAAG 3240 

ATTGTCACAA TGATGCTAGC AATCGTATAG AGGACAGCTG TTGCAAGTTT ATCTAATTTC 3 300 

TTAGCGCGCA TAATTTTTCT TTCCTCTTTC TTTCGTAATC AATTTAATCA CACTGTTAAA 3 3 60 

AACTAAGCTC ATCAAGAGCA GTACCAAGGC CAGTGACCAG AGAACATTAT TATTTACAGT 3420 

TCCCATGACA GTGTTCCCAA TTCCCATAGT TAATATAGAA GTTAAAGTTG CAGCTGGTGT 3480 

GGTCAAGGAA GTTGGGATAA CAGCTGAGTT TCCGACAACC ATCTGGATAG CTAGAGCCTC 3 540 

ACCAAAGGCA CGCGCCATCC CAAAGACCAC TGCAGTGAAA ATACCAGAAC GGGCCGCCTT 3 600 

CAAGATCACA CGCCAGATAG TCTGCCAGCG AGTGGCTCCC ATAGCGAAAC TGGCTTCACG 3 660 

ATAATAACGA GGAACCGCAC GCAAGCTATC CGTTGTCATA AAGGTTACGG TCGGCAAAAT 3720 

CATGACAAAG AGGACGGAAA TCCCTGACAA AATCCCAAAA CCAGTCCCAC CAAAGACACT 3780 

GCGAACAAAG GGAACGACGA CTTGCAAGCC AATAAATCCG TACACTACTG AAGGAATCCC 3 840 

AACCAGGAGT TCAATAGCTG GTTGCAAAAT CTTCGCCCCT TTTGGTGATA CTTCGGTCAT 3 900 

AAAAACTGCT GCACCAATAG CAAAGGGTGT TGCGATAAGG GCTGAGAGAA TGGTAACGAT 3 9 60 

AAAGGAACCC AAAATCATAG GAAGGGCACC AAATTCTTTA CTAGAAGGAT TCCAAGTTCC 4 020 

TCCCAAAAGA AAGTCAAAGA TATTCACACC ATTGACAAAG AAGGTCGACA AGCCTTTTTG 4 080 

CGCTACGAAA ACCAAAATCA TGGCCACAAG GATGACTATC AAAGAAAGAC AGGCAAAGGT 4140 

CAAACCTTTT CCTAATTTCT CCAGACGAGA ATTCTTTGAT GGAAGCAACA TTTTCTTAGC 4200 

TAATTCTTCT TGATTCATTA TTGTCTCCCT TCCAACACTG TCACAGTTCC GGCAGCATCT 42 60 

TTTTCAACCT TCATTTCCTT AATCGGAATA TACTTCAATC CTTTGACAAT CCCTTCTTGG 4320 

GTCTCATCCG AGAGAACAAA ATTGAGAAAT TCTGCAGCCA ACTCATTGGG CTGCCCCAAT 4 380 

GTATACATAT GCTCATAAGA CCACAAGGGC CAATTATTGC TACTTATATT TTCTGGACTT 4440 

AAGTCATAGC CATTCAACTT CATGCTTTTG ACCGAATCAT CTATATAGGT AAGAGATAAA 4500 

TAAGAGATAG CTCCTGGACT TTTTGATACG ATTGATTTTA CCGCTCCATT TGAATCCTGC 4 5 60 

TCCTGACTTT GCATGGCAGA CTGACCTTCC ATAATGACAG TATCAAAGGT AGCACGAGAG 4 620 

CCAGAGCCGG CTGCCCGATT GATAACAGAG ATGGGTAAGT CCTTACCACC AACCTCTTTC 4 680 

CAATTGGTTA CCTCACCTAT GAAGATTTGA CGAAGTTGCT CTGTCGTTAG GTTATCAACA 4740 

TCAACCTCCT TATTGACAAT CAGAGCCAAG CCAGCTACCG CGACCTTGTG GTCAACAAGA 4 8 00 

GCAGAAGCAT CAATTCCGTC TTTTTCCTCA GCAAATACAT CTGAGTTTCC TATATCAACT 4 8 60 

GCCCCAGACT GAACCTGGGA CAAGCCTGTA CCAGAACCTC CCCCTTGGAC ATTGACCGTT 4920 

TTTCCAACAT GGATCGTGCC AAATTCATCT GCCGCTACTT CAACCAAGGG. TTGCAAGGCA 4980 
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GTTGAGCCAA CAGCCGTTAT GGATTCTCCA CGATCAATCC AGCTAGCACA GCCTACTAAA 5040 

CAAGCCGTCA GCCAAAAAGC GATAAGAGAC AGAGCAAGCT TTTTTCTTTT TTTCACTGTT 5100 

TTTCTCCTCG AAAATAATTA TGAATACTGT GAATTTTTTA AGTAGTTCTT TATGAGTTGA 5160 

CGCATGAATT CTTACCAAAT TTCTGCGCAA TTGATTATTT ATATAATATA GGCTATATTA 5220 

CTCTTTCCTA ACCTCCTTTT TTCATATGTG GATAAAATCT CTTGTCTATC CCTTCCCCCA 5280 

TTGTCACCCA TTATAGTCAT TTCGTGTCTC TTTTTCCCCT TTTTAATGCA AGGGAAATTA 5340 

CTCTCCTTAG ATGATAATCC AAAAGCTAGA AAGGTATCTC AAACCTCTCT ACTCTCCCAG 5400 

ACTAGTTTAC AACTAAAAGG AAAAGATTCT ATTTTATGAG AAATCTAGTT TACAAGCGGT 5460 

AAGAACGCTA ATAACTAAAC TTCTTGTACT CTTTGAAAAT CTCTTCAAAC CAGTGTTTTG 5520 

AG C TAT C T AT GGCTAGCTTC CTAGTTTGCT CTTTGATTTT CATTGAGTAG TAAAACTACA 5 580 

TGTAATGGCA ATCAAGATAT CAAGAATCAT CCTACTAAAA AAATCCATAC TTTCACTATA 5 640 

ACATAGAATA AGATATTTGA CTAGCATTTT CATTTGAATC TGAGGCCTTT TGGAAAATAA 5700 

TTTTTCAAAA CATTTCCAGT AACCTTTGCA AAGCCCAAGC CATTGCCTTT AACCAAAACT 5760 

TGGTACCAAC CATTTGGCAG ACTTTCTGCC AGCTGAACGG TTTCTCCAGC CGCATACTTG 5 820 

ACAAACGCTT CTTGGCCAAT TTCAACCGAC TGTTCGACCT GACTCGGTTT CAAGGCTAAA 5 8 BO 

CCAAGAGCGA AACTGGGCTC AAAGCGTTTC TTCTTAAAAG TACCCAGATG CAGTCCATTG 5940 

CGAGCAATCT TGAGCTTCCA TAAATCTGGC AAAAGTTCTG GCAAGAGATA AAGCTGGTCT 6000 

CCAAAAATCT GCAAGATACC CGGTAGATTG ACCTTCAAAT GGTTTTGGGC AAATTCCTGC 6060 

CACAAGGCAA CTTGTTCACG GCTGAGGTTA CTCTTACTTG CCTTAAATTT AGGAGCTGGA 6120 

TTGTTACCCT TAAACTGTAG ATGGGCAACA AACTGACCCT CTCCCTTAAA CTGATGAGGA 6180 

TACATCCGAG CCGTTTCTGG CAGGTCAATA CCAGCTACCA TTCCATTGAT ATGCTCTACT 62 40 

GGCAACAAGT CAAAATCATA CTCTTCCAGC AACCAATTGA CAATCTCTTC GTTTTCCTCG 63 00 

GGTGCCCAGG TACAGGTCGA ATAAACCAGA TGACCACCTT CAGCTAACAT GGTCACTGCA 63 60 

TCCTCCAGAA TTTCTCTTTG CAAGCTAGCA CATTGACTCG GATAATCTAA GCTCCAATAG 6420 

TCCATAGCAT CAGGTTGCTT ACGAAACATT CCTTCACCAG AGCAAGGGGC ATCAAGAACG 64 80 

ATTAAGTCAA AATAGCCTTT AAAGACCTTG ACCAAGCGGT CGGCAGATTC ATTGGTCACC 6540 

ACGACATTTG TCGCTCCAAA ACGCTCCATG TTTTCAACCA AAATCTTAGC CCGTTTGCTT 6600 

GAAATTTCAT TGGAAnCAAG TAGCCCCTCC CCTGCTAGAT AGGCTGCCAG TTGAGTTGAT 6 6 60 

TTGCCCCCCG GTGCAGCAGC CAAGTCCAAG ACCTTCATAC CAGGACTGGG TTGGGCTACT 672 0 
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TGAGCCACCA TTTGAGCAGC AGGTTCTTGC GAATAAACTA AACCTGTAGC ATGCTCAGGC 6780 

GATTTCCCTG AAACCTTCCC ATAGTGGCCC CAAGGGGTTT GAGTAATGGC AT C AG AAAAG 6840 

GAAAGTTGCT CTTCTTTTAA GGGATTGACC CGAAAGGCCG AAACCGCTTC CTCCTCAAAA 6 900 

GAGGCAAGAA AATCTCTTGC CTCATCTCCT AGTATCTCTT TATATTTTTC AACAAATCCT 69 60 

TCTGGAAATT GCATTTAAGT TCTTTTCCTT TCGTAAATAT AGGACTGAAT TTCCTCCTGC 7020 

ATCTCAAGAG GCACCATCAT GACCGGCTGT CTGGTTTGAA AATCAGGAGC TTCACCAAAA 7 080 

AGGGTCACAA CCCGATAGCC CAGACTTTCC CCTAAAATAC TAGCTGCGGC ATAATCCCAT 7140 

GGTTGCAGAT AAGTGAGATA GGTCAACAAA CGCCCTGACA AAATCTTGGC AAAACT AAT G 72 00 

GCCGCACTTC CATAGACACG AACACCAAGA ACCGCTCGGC TCAAATCAGC CAGCCCCCAT 72 60 

TCATTGGTTT CCAGCATACC ACTATTCCCT GCAATGAGAA AATCTCCAAG TGGTTTAGTT 73 2 0 

TTAAAAGGAG CTAGGGACCT ATCATTTAGA CAAACTGGAA ATTCCCCACC ACCGTGGTAA 7 380 

CAATCCCCTT TGACCACATC ATAAATCAGA CCAAACTGTC COTGACCATT TTCAAAATAA 744 0 

GCCATCATAA CAGCAAAATC TTCCTGCTGG GCTACAAAAT TATTGGTACC ATCAATGGGA 7 500 

TCAATGACCC AAACCTTGCC CTCTTGAACC GAGGCTCGCA GACAACCTTC TTCAGCACAA 7 56 0 

ATCTTATCCT CAGGATAACG GGACAAAATC TCACCAACCA AGAGTTCCTG AACTTCTTTG 7 62 0 

TCCAGTCTGG TCACCAAATC TGTTGGAGAG GACTTGGTTT CAACACGCAA GTCTTCCTGC 768 0 

ATATGGTCAA GAATGTACTG ACCTGCTTTC TTAACAAGCT CTTTAGCAAA TTCAAATTTA 7740 

CTTTCCAAGA GAAATCTTTC CTTCCCCTTT TTCTTTGGGG 77 80 
(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4820 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

GTAATGATAT AGGAACACCA GGTGACCTGA TGGGACGTCG TAAGCCTATG AACTACTAGC 60 

TGCTAAAGGC TTTAAAGATG GTATGGTACC ATATATCTCA AACCAATACG AAGAAGAAGC 120 

CAAACAAAAG GGCAAGACAA TCAATCTCTA CGGTAAAACA AGAGGTTTGG TTACAGATGA 180 

CTTGGTTTTG GAAAAGGTAT TTAATAACCA ATATCATACT TGGAGTGAGT TTAAGAAAGC 240 

TATGTATCAA GAACGACAAG ATCAGTTTGA TAGATTGAAC AAAGTTACTT TTAATGATAC 300 

AACACAGCCT TGGCAAACAT TTGCCAAGAA AACTACAAGC AGTGTAGATG AATTACAGAA 360 
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ATTAATGGAC GTTGCTGTTC GTAAGGATGC AGAACACAAT TACTACCATT GGAATAACTA 42 0 

CAATCCAGAC ATAGATAGTG AAGTCCACAA GCTCAAGAGA GCAATCTTTA AAGCCTATCT 480 

TGACCAAACA AATGATTTTA GAAGTTCAAT TTTTGAGAAT AAAAAATAGT GTCTACTATT 540 

AGGAAATAAA GTTTAAAAAG GTGATGAAGA ACAAACCAAG ATTCAAGCAG GAATTCCTAC 600 

TGATAATGAA GTAAGTTATG ATCTTATTTA TCAGCAGGAA ACTCTTCCTG CAACAGGTTC 660 

ATCAACTTCT GAGCTTACAG CTTTAGGCCT ATTAGCTGTT GGTAGTT TAG TTCTTTTGGT 720 

TCATAATATG ACGGGAACAG TTTTTTGCTC CCTCTGAAAA GTCATCATTT GATGGCTTTT 78 0 

TTCTATATAG GGTAAAAGAT AGGGTAAAAG GCTATCATCG GACAAAATAA AGAAGGCATG 840 

ATATAATATA AAGTAGATTT CTATGTCATA AAACAAGAAC TGTTTGGACA TCATTCATTT 900 

GAAAACTCTC TATGTTCAAA CAATAGTAAA ATAAAATAGG GGATCTAAAT CCTTGCTATG 960 

AAAGGAAAAA ACTCAATGGC TACTATTCAA TGGTTTCCTG GTCACATGTC TAAAGCTCGT 102 0 

CGACAGGTGC AGGAGAATTT AAAATTTGTT GATTTTGTGA CGATTTTAGT AGATGCACGC 1080 

TTGCCTCTAT CTAGTCAAAA TCCTATGTTG ACCAAGATTG TTGGTGATAA ACCAAAACTC 114 0 

TTGATTTTAA ACAAGGCCGA CTTGGCTGAT CCAGCAATGA CCAAGGAATG GCGTCAGTAT 1200 

TTTGAATCAC AAGGAATCCA GACGCTAGCT ATCAACTCCA AAGAGCAAGT GACTGTAAAA 1260 

GTTGTAACAG ATGCGGCCAA GAAGCTCATG GCTGATAAGA TTGCTCGCCA GAAAGAACGT 1320 

GGGATTCAGA TTGAAACCTT GCGTACTATG ATTATCGGGA TTCCAAACGC TGGTAAATCA 13 80 

ACTCTGATGA ACCGTTTGGC TGGTAAAAAG ATTGCTGTTG TTGGAAACAA GCCAGGGGTC 1440 

ACAAAAGGTC AACAATGGCT TAAAACCAAT AAAGACCTGG AAATCTTGGA TACACCGGGG 1500 

ATTCTCTGGC CTAAGTTTGA GGATGAAACT GTTGCACTTA AGTTGGCATT GACTGGAGCT 15 60 

ATCAAAGACC AGTTGCTTCC TATGGATGAG GTTACCATTT TTGGTATCAA TTATTTCAAA 1620 

GAACATTATC CAGAAAAGCT GGCTGAACGC TTCAAACAAA TGAAAATTGA AGAAGAAGCG 1680 

CCTGTGATTA TTATGGATAT GACCCGCGCC CTCGGTTTCC GTGATGACTA TGACCGTTTT 1740 

TACAGTCTCT TCGTGAAGGA AGTCCGTGAT GGCAAACTCG GTAACTATAC CTTAGATACA 1800 

TTGGAAGACC TCGATGGCAA CGATTAAAGA AATCAAAGAA TTCCTTGTGA CAGTCAAGGA 18 60 

GTTAGAAAGC CCTATTTTTT TAGAGCTTGA AAAGGATAAT CGCTCAGGAG TTCAAAAGGA 192 0 

AATCAGCAAG CGTAAAAGAG CCATTCAAGC TGAATTAGAT GAAAATTTGC GCTTGGAATC 19 80 

CATGCTTTCT TATGAAAAAG AACTTTATAA GCAAGGATTG ACCTTAATTG CAGGTATTGA 2040 

TGAGGTTGGT CGTGGTCCTC TTGCTGGTCC TGTAGTCGCT GCGGCCGTTA TTTTATCTAA 2100 
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AAATTGTAAG ATTAAAGGTC TCAACGACAG CAAGAAAATT CCTAAAAAGA AACATCTGGA 2160 

GATTTTCCAA GCCGTTCAAG ACCAAGCCTT GTCGATTGGA ATTGGTATCA TAGATAATCA 2220 

GGTCATCGAC CAAGTCAACA TCTATGAAGC AACCAAACTA GCCATGCAAG AAGCAATCTC 2 2 80 

CCAGCTCAGC CCTCAACCAG AGCACCTTTT GATTGATGCC ATGAAACTGG ACTTGCCCAT 2 340 

TTCACAAACC TCCATTATCA AAGGAGATGC CAACTCCCTC TCTATCGCAG CAGCATCTAT 2 4 00 

AGTAGCCAAG GTAACACGTG ATGAATTGCT GAAAGAATAC GATCAGCAGT TCCCTGGCTA 24 60 

TGATTTCGCT ACTAATGCAG GATATGGCAC AGCTAAACAT CTGGAAGGCC TCACAAAACT 2520 

AGGAGTTACC CCAATTCACC GAACCAGCTT TGAACCCGTT AAATCACTGG TTTTAGGTAA 2580 

AAAAGAAAGT TAATTGAAAG GAAATAACAT GGAGGAACAG TCGGAAATAG TCCGTTCTAA 2 64 0 

GAAAGAATTC GCCTTTGCAT CCAGCACTAT ACTATCCCAA GTTGGTCGAG GAATCATTGT 27 00 

CGGCCTCATC GTTGGAATTA TCGTCGGATC CTTTCGTTTC TTAATTGAAA AGGGCTTCCA 27 60 

CCTGATACAA GGAGTTTATC AAGATCAAGG GTACTTAGTG CGCAATCTTT TTGTACTGGT 2 82 0 

TTTGTTTTAT ATACTCATCT GTTGGCTCAG TGCCAAACTA ACACGGTCAG AAAAAGATAT 2 880 

TAAAGGCTCA GGAATTCCTC AAGTCGAAGC CGAACTGAAA GGCCTCATGT CCCTCAACTG 2 94 0 

GTGGGGCATT CTTTGGAAAA AATATGTGCT AGGTATTCTT GCTATTGCCA GTGGACTCAT 300 0 

GCTGGGTCGA GAGGGACCCA GCATTCAACT TGGAGCAGTT GGTGGTAAAG GAATTGCCAA 3060 

GTGGCTCAAA TCCAGTCCAG TAGAGGAACG TTCCTTGATT GCCAGTGGAG CTGCAGCAGG 312 0 

TTTAGCCGCA GCCTTTAATG CTCCTATTGC AGCACTTCTC TTTGTTGTAG AAGAAGTCTA 3180 

TCACCATTTT TCGCGCTTTT TCTGGGTCTC AACTCTAGCA GCCAGCATCG TAGCAAACTT 324 0 

TGTGTCTCTA CTCATGTTCG GTTTGACACC AGTATTGGAT ATGCCAGATA ACATTCCTCC 3300 

CATGACCCTA GATCAGTATT GGATATATCT CGTCATGGGA ATTTTCCTTG GATTTTCAGG 33 60 

TTTTCTCTAT GAGAAAGCTG TATTAAACGT TGGAAGAGTT TATGACTTGA TTGGTCAAAA 342 0 

AATCCATTTG GATAGGGCTT ATTATCCCAT CTTGGCTTTT ATCCTTATCA TACCAGTCGG 348 0 

AATCTTCTTA CCTCAAATCA TTGGTGGCGG AAATCAGCTT GTCCTTTCTT TAACTGAACA 354 0 

AAATTTTAGT TTCCAAGTTT TATTAGCTTA CTTTTTAATC CGCTTTATTT GGAGTATGAT 3 600 

TAGCTATGGA AGTGGACTGC CAGGAGGAAT TTTCCTCCCC ATTTTAGCTC TTGGTTCTTT 3 660 

GCTTGGTGCC TTAGTTGGTG TTATCTGTGT CAATCTTGGA CTTGTCAGTC AAGAGCAATT 3 720 

CCCTATATTT GTCATTCTAG GAATGAGTGG CTATTTTGGA GCCATATCAA AAGCTCCCTT 3780 

AACCGCTATG ATCCTCGTAA CTGAGATGGT AGGAGATATT CGCAACCTTA TGCCACTTGG 3 840 

TCTTGTCACT CTTGTTTCTT ATATTATCAT GGATTTGCTC AAAGGTACGC CAGTCTATGA 3 900 
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AGCCATGCTG GAAAAAATGC TTCCAGAAGA AGTATCTAGC GAAGGAGAAG TTACACTTAT 3960 

CGAAATACCA GTTTCTGATA AAATTGCTGG GAAACAAGTT CATGAACTCA ACTTACCACA 402 0 

CAACGTCCTC ATCACAACTC AAGTCCATAA TGGCAAGAGC CAAACAGTTA ACGGCTCAAC 4080 

CAGAATGTAT CTGGGTGATA TGATTCACCT GGTTATTCCA AAAAGTGAAA TTGGAAAAGT 414 0 

CAAAGATTTG TTGTTGTAGT ATGAGTATTT ACATAATTTA TGTTATGTAA ATGATCAGTT 42 00 

TGATTTATTT AGAAAACCGA TTCTCAGGAA TGAGATCGGT TATTTTTTAC TGATGAGGAA 4260 

TTTTACATAT AAATAATTGA ACTTTATTAA AAATAAGACT ATAATTAAGT TAGAAATGAT 432 0 

AAAGTATAAA GCTAGAAAGG AGTTTACTGT ATCAAATCTG TACAGTAAGA TTAAAATCAT 4380 

GAAAAAGAAA ACAATAGCAA TTATATAGAG AAATGAAATA GAAATAGGAT AAAACAATCA 4440 

GGACAATCAA ATCAATTTCT AGCAATGTTT TAGAAGTCCA GATGTACTAT TCTAGTTTCA 4500 

ATCTATTATA CAATGTGTTT TGTATCTCAT AGCTCCTTAT ATAGCTCTTC AGTTATGTAG 4560 

TATTAACAGA AGTTTAGTGG GTGAGATTTT TATTATTTTC CTTATTCTGT TTTGTTTGTA 4 62 0 

GGTCTAAGTC TTTTTATCAC TTTGAAAAAC TCCTATAACA TCTTTCCGAA AAACTATAAT 4 680 

TTTCTTGAAA AATATACAAG TCTATGCTAT ACTACTAGTA TACTTACTTA TGGAGAAAAT 4740 

ACATGAAACG TGAGATTTTA CTGGAACGAA TCGACAAACT AAAACAACTC ATGCCCTGGT 4800 

AAGTTCTGGA ATACTACCAA 4820 
(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH : 21338 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 20: 
CTACGACATC ATGATTAACA GTCATGCGCT ACTACCAACT GAGCTATGGC GGATAAAATA 60 
GTCCGTACGG GATTCGAACC CGTGTTACCG CCGTGAAAAG GCGGTGTCTT AACCCCTTGA 120 
CCAACGGACC TTCTATCTGT AGCAGATATA ACCATTATAT CAATTTCTTG CTAATTGTCA 180 
ATCACTTTTG AGATTTTTTC TCTAAAATAT CTTTTAATTT TCTAATTTTT AATCTTGAAA 240 
TAGGACAACG ATGGTCTTCA TAGAAAACAA TTTCTAAGTT TTTTCGATCA ATTTCTCTGA 300 
TATTACCTAT ATTTACCAAA AATGACTTGT GAGGAGAATA AAATCGCTGA GTATGTTTGT 360 
CCTTTTCCTG AATATCTGTC ATGGTACCAT AAAACTCTTT TGCAAAATTC TTACCAATAA 42 0 
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TGCGCAATTT ATGAGATACC CCTGTTGTTT CAATATACAA AATATCATGG TAAGGAATTT 480 

TTAAATCATT TCCCTTGTAA TTGTAGTCGA AATAATCTAC AACATCTTCA TTTTCAAGTA 540 

ACATACTCTT CGTGTAGAAG ATATTTTGCT CAATTCTCTT CTTAAACATC T C AT CAT TG A 600 

TATCCTTATC AACAAAATCT AGGGCTGATA CCTGGTATTT ATAGGTTAGA GTCGCAAACT 660 

CTGATCGACT AGTGATAAAG ACGATAATAG CGTAAGGATT GTAATGACGA ATGAGCTGAG 720 

CCACTTCAAA TCCCTTTTTC TCAATTCCAT GAATATCGAT ATCTAGGAAA TAAAGCTGAT 7 80 

TTACTTCATC ATTTTCAATG TATTCTTCAA ATTCACGGAC TTTTCCCGTT GTCTTGTATG 840 

ATATTGGAAT ATTCGATTCT TTCGAAATTT CATCCAATAT TCTCTCTAGT CTCACTTGAT 900 

GTTCAATAAC ATCTTCTAAA ATTAAAACTT TCATTCAAAT TCCCTCTTAA ATCTAATGAT 9 60 

TTGTCTAAAT GTACTGCCTT CCATCTCTGT TTCTAAAATA ATATTGTTGT ACTTATCTAG 102 0 

TAGTTCTTTC ACATTATTTA ATCCGACTCC GCGATTTCTT CCCTTAGTGG AGAATCCTAA 1080 

GGCAAATAGA TCTCCTGAAG GAGTCATCGT CATTTTACAT GAATTCTGAA TCACAATAAC 1140 

TGTTTCAGTT TCCATCTTAA TAACTGCTAC TTCCATCTGC TTTTTATAGC TATCAGCCGA 12 00 

TCCTTCGACA GCATTATTCA ATAAAACGCT CATGATACGA ACCAAATCCA ATAGTTCAAT 12 60 

TGGAAGCTTG GTAATCGTAT CTTTTACTTC CAGTGTAAAC TCTACACCAT TATTTCGAGC 132 0 

ATAGACAATT GACTGAGCAA CCAAACTTCG TAAAGCTGAG TCTTCTATGT TGTTCAAATC 13 80 

AAAGTAAGTG TACTTATCTG AACGCAATTT ATGATTTGCT TTGACTAAAA CTTCATTGTA 1440 

AATTCTGTCA ATTTCCTGTA AATTACCACT GTCAATTGCC ATCTGCATGC TGACAAGCAT 1500 

TCCAGCATAA TCATGTCGAA AACCACGGAT TTCATTATAC AGACCAACAA TTTCATCTGT 1560 

GTAATTCTGT AAATGTTTCT GTTCAAATTT CTTCTGCTTC AAAGCAATCT CTTTCTCCAT 162 0 

TTGAACTTTA TGAGAATTCA TTGCAAAGAA GGTCAAAAGG AGAGAGATAA AGACAATAGA 168 0 

TGACAAAATA CTTCCAAAAC TATTCAAATG TTTAATCGTA CTTACCATAT CTGAAACGAA 174 0 

AGATACAATA TGTAGCAATA GTAAAGCAAA AAATACTTTT TTCAAGAAAG GATAAAGGTA 180 0 

GTCCTTGTCA AAATAGGCTA GTTCCAAATG GAAATAGTAA ATGATTTTTA ATGTAACAAA 1860 

ATAGGTTAAC ACCGTCACAA CGAAAAAGAA TGGGAAATGA TATTGTAAAA CAAAATTATC 1920 

TCCTGTTATA GAGGAGAAAA TTACGGACAG AAAGTTATGA GTGCTCTCAT ATAAAAGAGA 1980 

TAGTAGTAAA CTTAGGAATA GTCCTCTATC CCTCTCATAC TGTTTCATCC ATCGAAAATA 2040 

GGAATATAAG CCCAAAGGAA ATAAAAATCT TTCAATCCCT ATTTTATCTA AATATAGAAG 2100 

ATAAAAGGAA AATTCAAGTA CTATTTCAGT TAGTAATGTA TAAGCACCAA AAACGTATAA 2160 

TTCTTTTCTA TTTATTCGAC CTTTACAAAT TAAACGGTAA CTGTGACTAA TAATTAAAAA 22 20 
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ATGAACAATA ACTGTCCCAA ATCCAAGTAA ATCCATTACT CTTTCTCCTT AT T T CAT T AC 22 80 

TTTTTTCGTA GGAAAAGAAA ATCAAGGATG ATTCTTGAAA TCCTCATCTC CCCACCTTTA 2340 

ATCTTTTGTA AGTCTTTTTC CTTCAAAGCT ACAAACTGTT CCAATTTAAC TGTGTTTTTC 2400 

ATAATAAAAT CTCCTAAAAT GTTTTTTCTT GTAAGCTAAC TTACAAAAAC CATTATACAA 24 60 

AATGGAATTT CGTTTTAGAT AAAATTCTCT CAACTGTCAT TTTTTTCTCC CAAAGTGTAC 2520 

TTTTTTAAGA AAAAAGCCGG GAAAATTCCC AGCTTTGCTA TTATATTGAT CCCAGCAGGA 2580 

TTCGAACCTG CGACCGTTCG CTTAGAAGGC GAATGCTCTA TCCAGCTGAG CTATGAGACC 2 64 0 

TAATACAATT ATTCTACCAA AAATTCAATT AAAAGTCAAT TTTCTATTTA TGGTAGGGGA 27 00 

ATCCCTGCTG AATCGTAAAA GCGCGATAGA TTTGTTCAAC AAGAACTAGT CTCATTAACT 27 60 

GATGGGGTAA GGTTAGGCGA CCAAAACTGA CAGAAAGATT GGCTCTATTT TTTACAGATG 282 0 

ATGATAATCC TAAACTTCCC CCAATAATAA AAGTAAGAGT AGAAAATCCT TTTATAGAAG 2880 

TTTCTTCTAA CTGCTTACTA AATTCTTCTG AGAAGAAAGT TTTCCCTTCA ATGGCTAACA 294 0 

CAATAACGAA ATCACGGTCA GCAATTTTTG ATAAAATTCT CTGACCTTCT ATTTCTAAAA 3000 

TCTTTTGATT TTCTGATTCA CTGGCCTTAT CTGGTGTTTT TTCATCTGAT AACTCAATCA 3 060 

TTTCAAACTT AGCAAATCTA GAAATTCGTT TTGAATACTC TGCGATACCA TCTTTTAAAT 312 0 

ACTTTTCTTT CAGTTTCCCA ACTGTTACAA CTTTAATTTT CATGACTCTA TTCTAACATA 3180 

TTCTCTATTT TTTCACATCT TATTCACAAA ATAAAAAATA GATTTCAATT AAGAAAATCA 3240 

CAATTTCAAA AGAGTTATCC ACAGTTTGTG TAAAACTTTT GTGTTTAAGT TATAATTAAG 3300 

CTAGTCAGTT TATACTTTCA GTAATTCAAA CATATGGAGG CAAATATGAA ACATCTAAAA 3360 

ACATTTTACA AAAAATGGTT TCAATTATTA GTCGTTATCG TCATTAGCTT TTTTAGTGGA 342 0 

GCCTTGGGTA GTTTTTCAAT AACTCAACTA ACTCAAAAAA GTAGTGTAAA CAACTCTAAC 3480 

AACAATAGTA CTATTACACA AACTGCCTAT AAGAACGAAA ATTCAACAAC ACAGGCTGTT 3 540 

AACAAAGTAA AAGATGCTGT TGTTTCTGTT ATTACTTATT CGGCAAACAG ACAAAATAGC 3 600 

GTATTTGGCA ATGATGATAC TGACACAGAT TCTCAGCGAA TCTCTAGTGA AGGATCTGGA 3 660 

GTTATTTATA AAAAGAATGA TAAAGAAGCT TACATCGTCA CCAACAATCA CGTTATTAAT 3 720 

GGCGCCAgCA AAGTAGATAT TCGATTGTCA GATGGGACTA AAGTACCTGG AGAAATTGTC 3780 

GGAGCTGACA CTTTCTCTGA TATTGCTGTC GTCAAAATCT CTTCAGAAAA AGTGACAACA 3 840 

GTAGCTGAGT TTGGTGATTC TAGTAAGTTA ACTGTAGGAG AAACTGCTAT TGCCATCGGT 3900 

AGCCCGTTAG GTTCTGAATA TGCAAATACT GTCACTCAAG GTATCGTATC CAGTCTCAAT 3 960 



WO 98/18931 



PCT7US97/19588 



AGAAATGTAT CCTTAAAATC GGAAGATGGA CAAGCTATTT CTACAAAAGC CATCCAAACT 4 02 0 

GATACTGCTA TTAACCCAGG TAACTCTGGC GGCCCACTGA TCAATATTCA AGGGCAGGTT 4080 

ATCGGAATTA CCTCAAGTAA AATTGCTACA AATGGAGGAA CATCTGTAGA AGGTCTTGGT 4140 

TTCGCAATTC CTGCAAATGA TGCTATCAAT ATTATTGAAC AGTTAGAAAA AAACGGAAAA 4200 

GTGACGCGTC CAGCTTTGGG AATCCAGATG GTTAATTTAT CTAATGTGAG TACAAGCGAC 4260 

AT C AG AAG AC TCAATATTCC AAGTAATGTT ACATCTGGTG TAATTGTTCG TTCGGTACAA 4320 

AGTAATATGC CTGCCAATGG TCACCTTGAA AAATACGATG TAATTACAAA AGTAGATGAC 4380 

AAAGAGATTG CTTCATCAAC AGACTTACAA AGTGCTCTTT ACAACCATTC TATCGGAGAC 4440 

ACCATTAAGA TAACCTACTA TCGTAACGGG AAAGAAGAAA CTACCTCTAT CAAACTTAAC 4 500 

AAGAGTTCAG GTGATTTAGA ATCTTAATTG ACATCTATGT AAAGAAAGCT TTACATAAGA 4560 

GAAAAGATGT GTTAGTGTAG AATCATGGAA AAATTTGAAA TGATTTCTAT CACAGATATA 4 620 

CAAAAAAATC CCTATCAACC CCGAAAAGAA TTTGATAGAG AAAAACTAGA TGAACTAGCA 4 680 

CAGTCTATCA AAG AAAAT G G GGTCATTCAA CCGATTATTG TTCGTCAATC TCCTGTTATT 4740 

GGTTATGAAA TCcTTGCAGG AGAGAGACGC TATCGGGCTT CACTTTTAGC TGGTCTACGG 4 800 

TCTATCCCAG CTGTTGTTAA ACAGATTTCA GACCAAGAGA TGATGGTCCA GTCCATTATT 4 8 SO 

GAAAATTTAC AGAGAGAAAA TTTAAACCCA ATAGAAGAAG CACGCGCCTA TGAATCTCTC 4 920 

GTAGAGAAAG GATTCACCCA TGCTGAAATT GCAGATAAGA TGGGCAAGTC TCGTCCATAT 4 980 

ATCAGCAACT CCATTCGTTT ACTTTCCTTG CCAGAACAGA TTCTTTCAGA AGTAGAAAAT 5 040 

GGCAAACTAT CACAAGCCCA TGCGCGTTCC CTAGTTGGGT TAAATAAGGA ACAACAAGAC 5100 

TATTTCTTTC AACGGATTAT AGAAGAAGAT ATTTCTGTAA GGAAATTAGA AGCTCTTCTG 5160 

ACAGAGAAAA AACAAAAGAA ACAGCAAAAA ACTAATCATT TCATACAAAA TGAAGAAAAA 522 0 

CAGTTAAGAA AACTACTCGG ATTAGATGTA GAAATTAAAC TATCTAAAAA AGACAGTGGA 52 80 

AAAATCATTA TTTCTTTTTC AAATCAAGAA GAATATAGTA GAATTATCAA CAGCCTGAAA 534 0 

TAAGGCTGTT CTTTTATTTT TTTATCTCAC AAGGTTATCC ACTATGTTTT TCGATAAAAA 54 00 

GCTTAATAAA TCAATAATTT CTTCTTTTAT CCCCAACCTG TGGATAAAGT TTGGTAACAT 54 60 

TGTGGATTAT TTTTCACAGC TTGTGGAAAA TTCTTGCTAT CTATGGTAAA ATATCTCTAG 5520 

TATTAAACTT TTAAATAGTA AAGGAGGAGA AAGGATTGAA AGAAAAACAA TTTTGGAATC 5580 

GTATATTAGA ATTTGCACAA GAAAGACTGA CTCGATCCAT GTATGATTTC TATGCTATTC 5 640 

AAGCTGAACT CATCAAGGTA GAGGAAAATG TTGCCACTAT ATTTCTACCT CGCTCTGAAA 57 00 

TGGAAATGGT CTGGGAAAAA CAACTAAAAG ATATTATTGT AGTAGCTGGT TTTGAAATTT 57 6 0 
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ATGACGCTGA AATAACTCCC CACTATATTT TCACCAAACC TCAAGATACG ACTAGCTCAC 5 82 0 

AAGTTGAAGA AGCTACAAAT TTAACTCTTT ATAACTATAG TCCAAAGTTA GTATCTATTC 5880 

CTTATTCAGA TACGGGATTA AAAGAAAAGT ATACCTTTGA TAACTTTATT CAAGGGGATG 594 0 

GAAATGTTTG GGCTGTATCA GCCGCTTTAG CTGTCTCTGA AGATTTGGCT CTGACCTATA 6000 

ACCCTCTTTT TATCTATGGA GGACCAGGCC TTGGTAAGAC TCACTTATTA AACGCTATTG 6060 

GAAATGAAAT TCTAAAAAAT ATTCCTAATG CGCGTGTTAA ATATATCCCT GCCGAAAGCT 612 0 

TTATTAATGA CTTTCTTGAT CACCTAAGAC TTGGGGAAAT GGAAAAGTTT AAAAAGACCT 6180 

ATCGTAGTCT TGATCTTTTG TTAATCGATG ATATCCAGTC ACTCAGCGGA AAAAAAGTCG 624 0 

CAACTCAGGA AGAATTTTTC AATACCTTTA ACGCCCTTCA TGACAAGCAA AAACAGATTG 6300 

TCCTAACGAG TGATCGTAGT CCAAAACATC TAGAAGGGCT CGAGGAGAGG CTTGTCACGC 63 60 

GTTTTAGTTG GGGATTGACA CAAACTATCA CCCCCCCTGA CTTTGAAACA CGTATTGCCA 642 0 

TTTTACAAAG TAAGACGGAA CATTTAGGCT ACAATTTCCA AAGTGATACT CTAGAATACC 6480 

TAGCTGGGCA ATTTGATTCA AATGTTCGAG ATCTTGAGGG AGCCATCAAC GACATCACTT 654 0 

TAATTGCCAG AGTAAAAAAA AT CAAGG AT A TCACTATTGA TATTGCTGCA GAAGCCATTA 6600 

GAGCCCGCAA ACAAGATGTT AGCCAAATGC TCGTCATCCC AATTGATAAA ATCCAAACTG 6 660 

AAGTTGGTAA CTTTTATGGT GTTAGTATCA AAGAAATGAA GGGAAGTAGA CGCCTTCAAA 672 0 

ATATTGTTTT GGCCCGTCAA GTAGCCATGT ATTTATCTAG AGAACTAACA GATAATAGTC 67 80 

TTCCAAAAAT TGGGAAGGAA TTTGGGGGAA AAGATCATAC CACAGTCATT CATGCCCATG 6840 

CCAAAATAAA ATCTTTGATT GATCAAGACG ATAATTTACG TTTAGAAATT GAATCAATCA 6900 

AAAAGAAAAT CAAATAATTT GTGGATAACT TTTAGTTTTT TATCTTTTTT ATCCACATTT 6960 

TTTAAACAAG CTAAAAAACT TGATATGACT TGTTTAAAGG CTGTTTTCCA CAGATTTCAC 7 020 

AGACTCTATT ATTACTATTA TCTTTCTAAT ACTAAAAATA AATAAAGGAG AATCCATGAT 7 080 

TCATTTTTCA ATTAATAAAA ATTTATTTCT ACAAGCATTA AA TACT ACTA AGAGAGCTAT 7140 

TAGTTCTAAA AATGCCATTC CTATTTTATC AACAGTAAAA ATTGACGTGA CCAATGAAGG 7200 

TATTACTTTA ATTGGTTCAA ATGGTCAAAT TTCAATTGAA AATTTTATTT CTCAAAAAAA 7 2 60 

TGAAGATGCT GGTTTGTTAA TTACTTCTTT AGGTTCGATC CTTCTTGAAG CTTCTTTCTT 7320 

TATCAATGTA GTATCTAGTT TACCTGATGT AACTCTTGAT TTTAAAGAAA TTGAACAAAA 7380 

TCAAATTGTT TTAACCAGTG GCAAATCAGA AATTACC'CTA AAAGGAAAAG ATAGCGAACA 7440 

ATATCCACGA ATCCAAGAAA TTTCAGCAAG CACTCCTTTA ATACTTGAAA CAAAATTACT 7 500 
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CAAGAAAATT ATTAATGAAA CAGCCTTTGC TGCAAGTACA CAAGAGAGTC GTCCGATTTT 756 0 

AACAGGTGTC CACTTCGTAT TGAGTCAACA CAAAGAGTTA AAAACAGTTG CAACAGACTC 7 62 0 

TCATCGCCTA AGCCAGAAAA AATTGACTCT TGAAAAAAAT AGTGATGATT TTGATGTCGT 7680 

AATTCCTAGC CGTTCTCTAC GCGAATTTTC AGCGGTATTT ACAGATGATA TCGAAACTGT 7740 

AGAGATTTTC TTTGCCAATA ACCAAATCCT CTTTAGAAGC GAAAATATTA GCTTCTATAC 7 800 

TCGTCTCCTA GAAGGAAACT ATCCTGATAC AGATCGCTTG ATTCCAACAG ACTTTAACAC 7 860 

TACTATTACT TTTAATGTGG TAAACTTACG CCAGTCAATG GAGCGTGCCC GTCTTTTATC 7 920 

AAGTGCGACT CAAAATGGTA CTGTGAAACT TGAAATTAAG GATGGGGTTG TTAGCGCCCA 7 980 

TGTTCACTCT CCAGAAGTTG GTAAAGTAAA CGAAGAAATC GATACTGATC AGGTTACTGG 8040 

TGAAGATTTG ACCATTAGTT TCAACCCAAC TTACTTGATT GATTCTCTTA AAGCTTTAAA 8100 

TAGCGAAAAG GTGACTATTA GCTTTATCTC AGCTGTTCGT CCATTTACTC TTGTGCCAGC 8160 

AGATACTGAC GAAGACTTCA TGCAGCTCAT TACACCAGTT CGTACAAATT AAGTGAAAGA 8220 

GGTTGAGCCT GGCTCGCCTC TTTTATGATA TAATCGAAAA AGAAAAGGAG AGTAGTATGT 8280 

ATCAAGTTGG AAATTTTGTT GAGATGAAAA AATCACACGC TTGTACAATC AAGTCGACTG 83 40 

GTAAAAAGGC TAATCGTTGG GAAATTACAC GTGTAGGAGC AGATATCAAA ATAAAATGTA 84 00 

GTAATTGTGA GCATGTTGTC ATGATGGGGC GATATGATTT TGAGCGAAAA ATGAATAAAA 8460 

TTATTGACTG AGAACCCTTA GTTAGAGGGT TAGCACTTTA TCCCTTTTTG TGTTATAATA 8520 

TTAGGGATTG AAATGAAAAC GGAGAATGAG AAATATGGCT TTGACAGCAG GTATCGTTGG 8580 

TTTGCCAAAC GTTGGTAAAT CAACACTATT TAATGCAATT AC AAAAG C AG GAGCAGAGGC 8640 

AGCAAACTAC CCATTTGCGA CGATTGATCC AAATGTTGGA ATGGTGGAAG TTCCAGATGA 87 00 

ACGCCTACAA AAACTAACTG AAATGATAAC TCCTAAAAAG ACAGTTCCCA CAACATTTGA 87 50 

ATTTACAGAT ATTGCAGGGA TTGTAAAAGG AGCTTCAAAA GGAGAGGGGC TAGGGAATAA 8 82 0 

ATTCTTGGCC AATATTCGTG AAGTAGATGC GATTGTTCAC GTAGTTCGTG CTTTTGATGA 8880 

TGAAAATGTA ATGCGCGAGC AAGGACGTGA AGACGCCTTT GTAGATCCAC TTGCAGATAT 8 94 0 

TGATACCATT AATCTGGAAT TGATTCTTGC TGACTTAGAA TCAGTGAACA AACGATATGC 9000 

GCGTGTAGAA AAGATGGCAC GTACGCAAAA AGATAAAGAA TCAGTAGCAG AATTCAATGT 90 60 

TCTTCAAAAG ATTAAACCAG TCCTAGAAGA CGGGAAATCA GCTCGTACCA TTGAATTTAC 912 0 

AGATGAGGAA CAAAAGGTTG TCAAAGGTCT TTTCCTTTTG ACGACTAAAC CAGTTCTTTA 9180 

TGTAGCTAAT GTGGACGAGG ATGTGGTTTC AGAACCTGAC TCTATCGACT ATGTCAAACA 9240 

AATTCGTGAA TTTGCAGCGA CAGAAAATGC TGAAGTAGTC GTTATTTCTG CGCGTGCTGA 9 3 00 
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GGAAGAAATT TCTGAATTGA ATGATGAAGA TAAAAAAGAG TTTCTTGAAG CCATTGGTTT 93 60 

GACAGAATCA GGTGTAGATA AGTTGACGCG TGCAGCTTAC CACTTGCTTG GATTGGGAAC 9420 

TTACTTCACA GCTGGTGAAA AAGAAGTTCG CGCTTGGACT TTCAAACGTG GTATGAAGGC 94 80 

TCCTCAAGCA GCTGGTATTA TCCACTCAGA CTTTGAAAAA GGCTTTATTC GTGCAGTAAC 9 540 

CATGTCATAT GAAGATCTAG TGAAATACGG ATCTGAAAAG GCCGTAAAAG AAGCTGGACG 9600 

CTTGCGTGAA GAAGGAAAAG AATATATCGT TCAAGATGGC GATATCATGG AATTCCGCTT 9660 

TAATGTCTAA AAATTAATAA ATGGTGTCAA TTAGGTTGGA AAAAAATTCC AACCCTTTTG 9720 

GCTTTTGAAA GGAAAAATAA ATGACCAAAT TACTTGTAGG CTTGGGAAAT CCAGGGGATA 97 80 

AATATTTTGA AACAAAACAC AATGTTGGTT TTATGTTGAT TGATCAACTA GCGAAGAAAC 9 840 

AGAATGTCAC TTTTACACAC GATAAGATAT TTCAAGCTGA CCTAGCATCC TTTTTCCTAA 9 900 

ATGGAGAAAA AATTTATCTG GTTAAACCAA CGACCTTTAT GAATGAAAGT GGAAAAGCAG 99 60 

TTCATGCTTT ATTAACTTAC TATGGTTTGG ATATTGACGA TTTACTTATC ATTTACGATG 1002 0 

ATCTTGACAT GGAAGTTGGG AAAATTCGTT TAAGAGCAAA AGGCTCAGCA GGTGGTCATA 10080 

ATGGTATCAA GTCTATTATT CAACATATAG GAACTCAGGT CTTTAACCGT GTTAAGATTG 1014 0 

GAATTGGAAG ACCTAAAAAT GGTATGTCAG TTGTTCATCA TGTTTTGAGT AAGTTTGACA 10200 

GGGATGATTA TATCGGTATT TTACAGTCTG TTGACAAAGT TGACGATTCT GTAAACTACT 10260 

ATTTACAAGA GAAAAATTTT GAGAAAACAA TGCAGAGGTA TAACGGATAA ATGGTGACCT 1032 0 

TATTAGATTT AT T CTCAGAA AATGATCAGA TTAAAAAATG GCATCAAAAT TTAACAGATA 10380 

AGAAAAGACA ACTAATACTT GGTTTATCAA CATCTACTAA GGCTCTTGCA ATTGCAAGCA 104 4 0 

GTTTAGAAAA AGAAGATAGG ATTGTGTTAT TGACGTCAAC T T ATGGAG AA GCAGAAGGAC 10500 

TTGTTAGTGA TCTTATTTCT ATCTTGGGTG AGGAACTCGT CTATCCATTT TTGGTAGATG 10560 

ATGCTCCTAT GGTGGAGTTT TTGATGTCTT CACAGGAAAA AATTATTTCA CGGGTTGAAG 10 620 

CCTTGCGTTT TTTGACTGAT TCATCTAAGA AAGGGATTTT AGTTTGTAAT ATCGCAGCAA 10680 

GTCGATTGAT TTTACCGTCT CCCAATGCAT TCAAAGATAG TATTGTAAAA ATCTCAGTTG 1074 0 

GTGAAGAATA TGATCAACAC GCGTTTATCC ATCAGTTAAA GGAAAATGGC TATCGAAAAG 10800 

TTACTCAAGT ACAAACTCAG GGCGAATTTA GTCTTCGAGG AGATATTTTA GATATTTTTG 108 60 

AAATATCCCA GTTAGAACCT TGTCGAATTG AGTTTTTTGG TGATGAAATT GATGGTATCA 10920 

GGTCATTTGA AGTAGAAACA CAATTATCGA AAGAAAATAA GACAGAACTC ACTATCTTTC 10980 

CAGCTAGTGA TATGCTTTTG AGAGAAAAGG ATTATCAACG AGGACAGTCA GCTTTAGAAA 11040 
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AACAAATTTC AAAAACTTTA TCACCTATTT TGAAATCATA CCTAGAAGAA ATTCTTTCAA 1110 0 

GTTTTCACCA AAAACAAAGT CATGCAGACT CTCGGAAGTT TTTATCTTTG TGCTATGATA 11160 

AGACATGGAC TGTCTTTGAT TATATTGAAA AAGATACTCC AATATTCTTT GATGATTATC 112 2 0 

AAAAATTGAT GAATCAGTAT GAAGTCTTTG AAAGAGACTT AGCGCAGTAC TTTACAGAAG 11280 

AATTACAGAA TAGTAAAGCA TTTTCTGATA TGCAGTATTT TTCTGATATT GAACAAATCT 1134 0 

ATAAAAAACA AAGTCCAGTG ACCTTTTTCT CTAATCTTCA AAAGGGTTTA GGAAATCTCA 11400 

AATTTGACAA AATTTATCAA TTCAATCAAT ATCCTATGCA GGAATTTTTC AATCAGTTTT 11460 

CTTTTCTAAA AGAAGAAATT GAACGATATA AAAAAATGGA TTACACCATT ATTCTGCAGT 1152 0 

CTAGCAATTC AATGGGAAGT AAAACATTGG AGGATATGTT AGAGGAATAT CAGATTAAAT 11580 

TGGATTCTAG AGATAAGACA AATATCTGTA AAGAATCTGT AAACTTAATA GAGGGTAATC 11640 

TCAGACATGG TTTTCATTTT GTAGATGAAA AGATTTTATT GATAACTGAA CATGAGATTT 11700 

TTCAAAAGAA ATTAAAGCGT CGTTTTCGAA GACAACATGT TTCAAATGCA GAGAGATTAA 11760 

AAGATTACAA TGAACTTGAA AAAGGGGACT ATGTTGTCCA TCATATCCAT GGGATTGGTC 11820 

AATATCTAGG AATTGAAACC ATTGAAATCA AGGGAATTCA TCGCGATTAT GTCAGTGTCC 11880 

AATACCAAAA TGGTGATCAA ATTTCTATCC CCGTGGAACA GATTCATCTA CTGTCCAAAT 11940 

ATATTTCAAG TGATGGTAAA GCTCCAAAAC TCAATAAATT AAATGACGGT CATTTTAAAA 12000 

AGGCCAAGCA AAAGGTTAAG AACCAGGTAG AGGATATAGC TGATGATTTA ATCAAACTCT 12 060 

ACTCTGAACG TAGTCAGTTG AAGGGTTTTG CTTTCTCAGC TGATGATGAT GATCAAGATG 12120 

CCTTTGATGA TGCTTTCCCT TATGTTGAAA CGGATGATCA ACTTCGTAGT ATTGAGGAAA 12180 

TCAAGAGGGA TATGCAGGCT TCTCAGCCAA TGGATCGACT TTTAGTTGGG GATGTTGGTT 12 2 40 

TTGGAAAGAC TGAAGTTGCT ATGCGTGCAG CCTTTAAAGC AGTCAATGAT CACAAACAGG 12 3 00 

TTGTCATTCT AGTTCCGACG ACGGTTTTAG CGCAACAGCA CTATACGAAT TTTAAGGAAC 12 3 50 

GATTCCAAAA TTTTGCAGTT AATATTGATG TGTTGAGTCG CTTTAGAAGT AAAAAAGAGC 12 420 

AGACTGCAAC ACTTGAAAAA TTGAAAAACG GTCAAGTCGA TATTTTGATT GGAACACATC 124 80 

GTGTTTTGTC AAAAGATGTT GTGTTTGCTG ATTTGGGCTT GATGATTATT GATGAGGAAC 12 54 0 

AGCGATTTGG TGTCAAGCAT AAGGAAACTT TGAAAGAACT GAAGAAACAA GTGGATGTCC 12600 

TAACCTTGAC CGCTACGCCA ATCCCTCGTA CCCTCCATAT GTCTATGCTG GGAATCAGAG 12 660 

ATTTATCTGT TATTGAAACT CCGCCGACTA ATCGCTATCC TGTTCAGACC TATGTTTTGG 12720 

AAAAGAAT G A TAGTGTCATT CGTGATGCTG TCTTGCGTGA AATGGAGCGT GGAGGTCAAG 12780 

TTTATTATCT TTACAACAAA GTTGACACAA TTGTTCAGAA GGTTTCAGAA TTACAGGAGT 12840 
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TGATTCCGGA GGCTTCGATT GGATATGTTC ATGGTCGAAT GAGTGAAGTC CAGTTGGAAA 12 900 

ATACTCTATT AGACTTTATT GAGGGACAAT ACGATATCTT GGTGACGACT ACTATTATTG 12 960 

AGACAGGGGT GGACATTCCA AATGCTAATA CTTTATTTAT TGAAAATGCG GACCATATGG 1302 0 

GCTTGTCAAC CTTATATCAG TTAAGAGGAA GAGTCGGTCG TAGTAATCGT ATTGCTTATG 13080 

CTTATCTCAT GTATCGTCCA GAAAAATCAA TCAGTGAAGT CTCTGAAAAG AGATTAGAAG 1314 0 

CGATTAAAGG ATTTACAGAA TTGGGCTCTG GCTTTAAGAT TGCAATGCGA GATCTTTCGA 13200 

TTCGTGGAGC AGGAAATCTT TTAGGAAAAT CCCAGTCTGG TTTCATTGAT TCTGTTGGTT 132 60 

TTGAATTGTA TTCGCAGTTA TTAGAGGAAG CTATTGCTAA ACGAAACGGT AATGCTAACG 13320 

CTAACACAAG AACCAAAGGG AATGCTGAGT TGATTTTGCA AATTGATGCC TATCTTCCTG 1338 0 

ATACTTATAT TTCTGATCAA CGACATAAGA TTGAAATTTA CAAGAAAATT CGTCAAATTG 13440 

ACAACCGTGT CAATTATGAA GAGTTACAAG AGGAGTTGAT AGACCGTTTT GGAGAATACC 13500 

CAGATGTAGT AGCCTATCTG TTAGAGATTG GTTTGGTCAA ATCATACTTG GACAAGGTCT 13 560 

TTGTTCAACG TGTGGAAAGA AAAGATAATA AAATTACAAT TCAATTTGAA AAAGTCACTC 13 620 

AACGACTGTT TTTAGCTCAA GATTATTTTA AAGCTTTATC CGTAACGAAC TTAAAAGCAG 13680 

GCATCGCTGA GAATAAGGGA TTAATGGAGC TTGTATTTGA TGTCCAAAAT AAGAAAGATT 13740 

ATGAAATTTT AGAAGGTTTG CTGATTTTTG GAGAAAGTTT ATTAGAGATA AAAGAGTCTA 13 800 

AGGAAGAAAA TTCCATTTGA TATTTTTCTT CTATAAAATA GATAAAAATG GTACAATAAT 13 860 

AAATTGAGGT AATAAGGATG AGATTAGATA AATATTTAAA AGTATCGCGA ATTATCAAGC 13 920 

GTCGTACAGT CGCAAAGGAA GTAGCAGATA AAGGTAGAAT CAAGGTTAAT GGAATCTTGG 13 980 

CCAAAAGTTC AACGGACTTG AAAGTTAATG ACCAAGTTGA AATTCGCTTT GGCAATAAG'J' 14040 

TGCTGCTTGT AAAAGTACTA GAGATGAAAG ATAGTACAAA AAAAGAAGAT GCAGCAGGAA 14100 

TGTATGAAAT TATCAGTGAA ACACGGGTAG AAGAAAATGT CTAAAAATAT TGTACAATTG 14160 

AATAATTCTT TTATTCAAAA TGAATACCAA CGTCGTCGCT ACCTGATGAA AGAACGACAA 14220 

AAACGGAATC GTTTTATGGG AGGGGTATTG ATTTTGATTA TGCTATTATT TATCTTGCCA 14280 

ACTTTTAATT TAGCGCAGAG TTATCAGCAA TTACTCCAAA GACGTCAGCA ATTAGCAGAC 14 340 

TTGCAAACTC AGTATCAAAC TTTGAGTGAT GAAAAGGATA AGGAGACAGC ATTTGCTACC 14400 

AAGTTGAAAG ATGAAGATTA TGCTGCTAAA TATACACGAG CGAAGTACTA TTATTCTAAG 14460 

TCGAGGGAAA AAGTTTATAC GATTCCTGAC TTGCTTCAAA GGTGATAAAA TGGAAAATTT 14520 

ATTAGACGTA ATAGAGCAAT TTTTGAGTTT GTCAGATGAA AAGCTGGAAG AATTGGCTGA 14580 
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TAAAAATCAA TTATTGCGTT TACAAGAAGA AAAGGAAAGG AAGAATGCGT AAATTCTTAA 14 640 

TTATTTTGTT GCTACCAAGT TTTTTGACCA TTTCAAAAGT CGTTAGCACA GAAAAAGAAG 14700 

TCGTCTATAC TTCGAAAGAA ATTTATTACC TTTCACAATC TGACTTTGGT ATTTATTTTA 14760 

GAGAAAAATT AAGTTCTCCC ATGGTTTATG GAGAGGTTCC TGTTTATCCG AATGAAGATT 14B20 

TAGTAGTGGA ATCTGGGAAA TTGACTCCCA AAACAAGTTT TCAAATAACC GAGTGGCGCT 14 880 

TAAATAAACA AGGAATTCCA GTATTTAAGC TATCAAATCA TCAATTTATA GCTGCGGACA 14940 

AACGATTTTT ATATGATCAA TCAGAGGTAA CTCCAACAAT AAAAAAAGTA TGGTTAGAAT 15000 

CTGACTTTAA ACTGTACAAT AGTCCTTATG ATTTAAAAGA AGTGAAATCA TCCTTATCAG 15060 

CTTATTCGCA AGTATCAATC GACAAGACCA TGTTTGTAGA AGGAAGAGAA TTTCTACATA 15120 

TTGATCAGGC TGGATGGGTA GCTAAAGAAT CAACTTCTGA AGAAGATAAT CGGATGAGTA 15180 

AAGTTCAAGA AATGTTATCT GAAAAATATC AGAAAGATTC TTTCTCTATT TATGTTAAGC 152 40 

AACTGACTAC TGGAAAAGAA GCTGGTATCA ATCAAGATGA AAAGATGTAT GCAGCCAGCG 153 00 

TTTTGAAACT CTCTTATCTC TATTATACGC AAGAAAAAAT AAATGAGGGT CTTTATCAGT 153 60 

TAGATACGAC TGTAAAATAC GTATCTGCAG TCAATGATTT TCCAGGTTCT TATAAACCAG 1542 0 

AGGGAAGTGG TAGTCTTCCT AAAAAAGAAG ATAATAAAGA ATATTCTTTA AAGGATTTAA 154 80 

TTACGAAAGT ATCAAAAGAA TCTGATAATG TAGCTCATAA TCTATTGGGA TATTACATTT 15540 

CAAACCAATC TGATGCCACA TTCAAATCCA AGATGTCTGC CATTATGGGA GATGATTGGG 15600 

ATCCAAAAGA AAAATTGATT TCTTCTAAGA TGGCCGGGAA GTTTATGGAA GCTATTTATA 15660 

ATCAAAATGG ATTTGTGCTA GAGTCTTTGA CTAAAACAGA TTTTGATAGT CAGCGAATTG 15720 

CCAAAGGTGT TTCTGTTAAA GTAGCTCATA AAATTGGAGA TGCGGATGAA TTTAAGCATG 157 8 0 

ATACGGGTGT TGTCTATGCA GATTCTCCAT TTATTCTTTC TATTTTCACT AAGAATTCTG 15 840 

ATTATGATAC GATTTCTAAG ATAGCCAAGG ATGTTTATGA GGTTCTAAAA TGAGGGAACC 15900 

AGATTTTTTA AATCATTTTC TCAAGAAGGG ATATTTCAAA AAGCATGCTA AGGCGGTTCT 15960 

AGCTCTTTCT GGTGGATTAG ATTCCATGTT TCTATTTAAG GTATTGTCTA CTTATCAAAA 16020 

AGAGTTAGAG ATTGAATTGA TTCTAGCTCA TGTGAATCAT AAGCAGAGAA TTGAATCAGA 16080 

TTGGGAAGAA AAGGAATTAA GGAAGTTGGC TGCTGAAGCA GAGCTTCCTA TTTATATCAG 16140 

CAATTTTTCA GGAGAATTTT CAGAAGCGCG TGCACGAAAT TTTCGTTATG ATTTTTTTCA 16200 

AGAGGTCATG AAAAAGACAG GTGCGACAGC TTTAGTCACT GCCCACCATG CTGATGATCA 16260 

GGTGGAAACG ATTTTTATGC GCTTGATTCG AGGAACTCGC TTGCGCTATC TATCAGGAAT 16320 

TAAGGAGAAG CAAGTAGTCG GAGAGATAGA AATCATTCGT CCCTTCTTGC ATTTTCAGAA 16380 
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AAAAGACTTT CCATCAATTT TTCACTTTGA AGATACATCA AATCAGGAGA ATCATTATTT 16440 

TCGAAATCGT ATTCGAAATT CTTACTTACC AGAATTGGAA AAAGAAAATC CTCGATTTAG 16500 

GGATGCAATC TTAGGCATTG GCAATGAAAT TTTAGATTAT GATTTGGCAA TAGCTGAATT 16560 

ATCTAACAAT ATTAATGTGG AAGATTTACA GCAGTTATTT TCTTACTCTG AGTCTACACA 16620 

AAGAGTTTTA CTTCAAACTT ATCTGAATCG TTTTCCAGAT TTGAATCTTA CAAAAGCTCA 16 680 

GTTTGCTGAA GTTCAGCAGA TTTTAAAATC TAAAAGCCAG TATCGTCATC CGATTAAAAA 16740 

TGGCTATGAA TTGATAAAAG AGTACCAACA GTTTCAGATT TGTAAAATCA GTCCGCAGgC 16 800 

TGATGAAAAG GAAGATGAAC TTGTGTTACA CTATCAAAAT CAGGTAGCTT ATCAAGGATA 16860 

TTTATTTTCT TTTGGACTTC CATTAGAAGG TGAATTAATT CAACAAATAC CTGTTTCACG 16920 

TGAAACATCC ATACACATTC GTCATCGAAA AACAGGAGAT GTTTTGATTA AAAATGGGCA 16980 

TAGAAAAAAA CTCAGACGTT TATTTATTGA TTTGAAAATC CCTATGGAAA AGAGAAACTC 17 040 

TGCTCTTATT ATTGAGCAAT TTGGTGAAAT TGTCTCAATT TTGGGAATTG CGACCAATAA 17100 

TTTGAGTAAA AAAACGAAAA ATGATATAAT GAACACTGTA CTTTATATAG AAAAAATAGA 17160 

TAGGTAAAAA ATGTTAGAAA ACGATATTAA AAAAGTCCTC GTTTCACACG ATGAAATTAC 17220 

AGAAGCAGCT AAAAAACTAG GTGCTCAATT AACTAAAGAC TATGCAGGAA AAAATCCAAT 172 80 

CTTAGTTGGG ATTTTAAAAG GATCTATTCC TTTTATGGCT GAATTGGTCA AACATATTGA 17340 

TACACATATT GAAATGGACT TCATGATGGT TTCTAGCTAC CATGGTGGAA CAGCAAGTAG 174 0O 

TGGTGTTATC AATATTAAAC AAGATGTGAC TCAAGATATC AAAGGAAGAC ATGTTCTATT 174 60 

TGTAGAAGAT ATCATTGATA CAGGTCAAAC TTTGAAGAAT TTGCGAGATA TGTTTAAAGA 17 520 

AAGAGAAGCA GCTTCTGTTA AAATTGCAAC CTTGTTGGAT AAACCAGAAG GACGTGTTGT 175 80 

AGAAATTGAG GCAGACTATA CTTGCTTTAC TATCCCAAAT GAGTTTGTAG TAGGTTATGG 17 64 0 

TTTAGACTAC AAAGAAAATT ATCGTAATCT TCCTTATATT GGAGTATTGA AAGAGGAAGT 177 00 

GTATTCAAAT TAGAAAGAAT AATCTTTAAT GAAAAAACAA AATAATGGTT TAATTAAAAA 177 60 

TCCTTTTCTA TGGTTATTAT TTATCTTTTT CCTTGTGACA GGATTCCAGT ATTTCTATTC 17 82 0 

TGGGAATAAC TCAGGAGGAA GTCAGCAAAT CAACTATACT GAGTTGGTAC AAGAAATTAC 17880 

CGATGGTAAT GTAAAAGAAT TAACTTACCA ACCAAATGGT AGTGTTATCG AAGTTTCTGG 17940 

TGTCTATAAA AATCCTAAAA CAAGTAAAGA AGAAACAGGT ATTCAGTTTT TCACGCCATC 18 000 

TGTTACTAAG GTAGAGAAAT TTACCAGCAC TATTCTTCCT GCAGATACTA CCGTATCAGA 18060 

ATTGCAAAAA CTTGCTACTG AC C ATAAAGC AGAAGTAACT GTTAAGCATG AAAGTTCAAG 1812 0 
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TGGTATATGG ATTAATCTAC TCGTATCCAT TGTGCCAT'PT GGAATTCTAT TCTTCTTCCT 18180 

ATTCTCTATG ATGGGAAATA TGGGAGGAGG CAATGGCCGT AATCCAATGA GTTTTGGACG 18240 

TAGTAAGGCT AAAGCAGCAA ATAAAGAAGA TATTAAAGTA AGATTTTCAG ATGTTGCTGG 183 0 0 

AGCTGAGGAA GAAAAACAAG AACTAGTTGA AGTTGTTGAG TTCTTAAAAG ATCCAAAACG 183 60 

ATTCACAAAA CTTGGAGCCC GTATTCCAGC AGGTGTTCTT TTGGAGGGAC CTCCGGGGAC 18420 

AGGTAAAACT TTGCTTGCTA AGGCAGTCGC TGGAGAAGCA GGTGTTCCAT TCTTTAGTAT 18480 

CTCAGGTTCT GACTTTGTAG AAATGTTTGT CGGAGTTGGA GCTAGTCGTG TTCGCTCTCT 1854 0 

TTTTGAGGAT GCCAAAAAAG CAGCACCAGC TATCATCTTT ATCGATGAAA TTGATGCTGT 18600 

TGGACGTCAA CGTGGAGTCG GTCTCGGCGG AGGTAATGAC GAACGTGAAC AAACCTTGAA 18660 

CCAACTTTTG ATTGAGATGG ATGGTTTTGA GGGAAATGAA GGGATTATCG TCATCGCTGC 1872 0 

GACAAACCGT TCAGATGTAC TTGACCCTGC CCTTTTGCGT CCAGGACGTT TTGATAGAAA 1878 0 

AGTATTGGTT GGTCGTCCTG ATGTTAAAGG TCGTGAAGCA ATCTTGAAAG TTCACGCTAA 18 840 

GAATAAGCCT TTAGCAGAAG ATGTTGATTT GAAATTAGTG GCTCAACAAA CTCCAGGCTT 18 900 

TGTTGGTGCT GATTTAGAGA ATGTCTTGAA TGAAGCAGCT TTAGTTGCTG CTCGTCGCAA 18960 

TAAATCGATA ATTGATGCTT CAGATATTGA TGAAGCAGAA GATAGAGTTA TTGCTGGACC 19 020 

TTCTAAGAAA GATAAGACAG TTTCACAAAA AGAACGAGAA TTGGTTGCTT ACCATGAGGC 19 080 

AGGACATACC ATTGTTGGTC TAGTCTTGTC GAATGCTCGC GTTGTCCATA AGGTTACAAT 19140 

TGTACCACGC GGCCGTGCAG GCGGATACAT GATTGCACTT CCTAAAGAGG ATCAAATGCT 192 00 

TCTATCTAAA GAAGATATGA AAGAGCAATT GGCTGGCTTA ATGGGTGGAC GTGTAGCTGA 19260 

AGAAATTATC TTTAATGTCC AAACCACAGG AGCTTCAAAC GACTTTGAAC AAGCGACACA 19320 

AATGGCACGT GCAATGGTTA CAGAGTACGG TATGAGTGAA AAACTTGGCC CAGTACAATA 193 80 

TGAAGGAAAC CATGCTATGC TTGGTGCACA GAGTCCTCAA AAATCAATTT CAGAACAAAC 19440 

AGCTTATGAA ATTGATGAAG AGGTTCGTTC ATTATTAAAT GAGGCACGAA ATAAAGCTGC 19500 

TGAAATTATT CAGTCAAATC GTGAAACTCA CAAGTTAATT GCAGAAGCAT TATTGAAATA 19560 

CGAAACATTG GATAGTACAC AAATTAAAGC TCTTTACGAA ACAGGAAAGA TGCCTGAAGC 19 620 

AGTAGAAGAG GAATCTCATG CACTATCCTA TGATGAAGTA AAGTCAAAAA TGAATGACGA 19 680 

AAAATAACCC TGAGAGAGGC TGGAGCCTCT CTTTTTTGTG CAGTTTAGGA GCTAAAGGGA 19740 

ACAGAATGGA GAAAATGGAA CAAATGTGTT TTCTAATCTG TTAGACTGTA TCTAGAAAGG 198 00 

GGAAAATTAT GATTAAAGAA TTGTATGAAG AAGTCCAAGG GACTGTGTAT AAGTGTAGAA 198 60 

ATGAATATTA CCTTCATTTA TGGGAATTGT CGGATTGGGA GCAAGAAGGC ATGCTCTGCT 19 92 0 
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TACATGAATT GATTAGTAGA GAAGAAGGAC TGGTAGACGA TATTCCACGT TTAAGGAAAT 19980 

ATTTCAAGAC CAAGTTTCGA AATCGAATTT TAGACTATAT CCGTAAACAG GAAAGTCAGA 20040 

AGCGTAGATA CGATAAAGAA CCCTATGAAG AAGTGGGTGA GATCAGTCAT CGTATAAGTG 2 0100 

AGGGGGGTCT CTGGCTAGAT GATTATTATC TCTTTCATGA AACACTAAGA GATTATAGAA 2 0160 

ACAAACAAAG TAAAGAGAAA CAAGAAGAAC TAGAACGCGT CTTAAGCAAT GAACGATTTC 2 0220 

GAGGGCGTCA AAGAGTATTA AGAGACTTAC GCATTGTGTT TAAGGAGTTT ACTATCCGTA 2 0280 

CCCACTAGTA AGTCATGCAA AAAAAATGAA AAAAATTAGA AAAAGTAGTT GACAAAGTTT 2 0340 

GAAAAGGCTG TATAATAGTA AGAGTTGAAA ATAACAACTC AGGTCCGTTG GTCAAGGGGT 2 0400 

TAAGACACCG CCTTTTCACG GCGGTAACAC GGGTTCGAAT CCCGTACGGA CTATGGTATG 2 04 60 

TTGCGTCAGG ACCACTTGAT GAAAAAAAGT TTAAAAAAAC TTAAAAATCT TCAAAAAAGT 20520 

GTTGACAAGC GAAAGCAGTT GTGATATACT AATATAGTTG TCGCTTGAGA GAAGCAAGTG 205 80 

ACAAAGACCT TTGAAAACTG AACAAGACGA ACCAATGTGC AGGGCGCTAC AACGTAAGTT 20640 

GTAGTACTGA ACAATGAAAA AAACAATAAA TCTGTCAGTG ACAGAAATGA GTAAGAACTC 20700 

AAACTTTTTA ATGAGAGTTT GATCCTGGCT CAGGACGAAC GCTGGCGGCG TGCCTAATAC 207 60 

ATGCAAGTAG AACGCTGAAG GAGGAGCTTG CTTCTCTGGA TGAGTTGCGA ACGGGTGAGT 20820 

AACGCGTAGG TAACCTGCCT GGTAGCGGGG GATAACTATT GGAAACGATA GCTAATACCG 20 8 80 

CATAAGAGTA GATGTTGCAT GACATTTGCT TAAAAGGTGC ACTTGCATCA CTACCAGATG 20940 

GACCTGCGTT GTATTAGCTA GTTGGTGGGG TAACGGCTCA CCAAGGCGAC GATACATAGC 21000 

CGACCTGAGA GGGTGATCGG CCACACTGGG ACTGAGACAC GGCCCAGACT CCTACGGGAG 21060 

GCAGCAGTAG GGAATCTTCG GCAATGGACG GAAGTCTGAC CGAGCAACGC CGCGTGAGTG 21120 

AAGAAGGTTT TCGGATCGTA AAGCTCTGTT GTAAGAGAAG AACGAGTGTG AGAGTGGAAA 2118 0 

GTTCACACTG TGACGGTATC TTACCAGAAA GGGACGGCTA ACTACGTGCC AGCAGCCGCG 21240 

GTAATACGTA GGTCCCGAGC GTTGTCCGGA TTTATTGGGC GTAAAGCGAG CGCAGGCGGT 21300 

TAGATAAGTC TGAAGTTAAA GGCTGTGGCT TAACCATA 2133 8 
(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 6273 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 

TGTTTTTAAA GAGCCGTGTC TGGATAGACT TTCGGACGCA ACGCTCTATT AGATAATGAA 60 

CTGCCTATAC ACAAGATTTC TAACCTTAGT CGACATGAGC TGAAACCTCT TATTTGTTAA 12 0 

GTAGTTCACA AAATATTATA CACCTATTTT ATGAATAGTC AACTGTCTTT ACAGTAAAAT 180 

T T T AGAAAAT CATGAAAATT TTCTCTTTCT TTCCATTTTA AGTGACATTC AGTCATTCTC 240 

ACATCAAAAA AGCCCAGACG AAATTGTCTG AGCATTCTTT TATCTAGTCG TTTAAGGAAG 3 00 

TTGAGTTCAG TATGTTTAAA GTCTCTGTCC CATCATTTCT TCAACAAACC TTGTTCTTGG 360 

AGAAACTCCT TGGCTACTTG CTTTGCTGAC TTGCCTTCAA CACCGACTTG GTAGTTGAGC 42 0 

TGGCTCATCT GGCTTTCTGT AATCTTACCA GCCAATGTAT TAAGAACTCT TTCCAACTCT 480 

GGGTGTTTCT TGAGAAGAGC TTCTTTCATG AGTGGAGCCC CTTGATAAGG TGGGAAGAGT 540 

TGCTTGTCAT CTTCCAAGAC CTGTAAATCA TAACGCTCCA ATTCCGCATC AGTCGAATAG 600 

GCATCCGTGA TTTGAATATC CCCTGACTGA ATAGCCTGAT AGCGAAGGGC TGGCTCAATG 660 

GTCGCTACAT TGAGATTGAG ACCATACATT GATTGCAAGC CCTTATTTCC ATCTTCACGG 720 

TCGTTAAACT CGAGTGTAAA ACCTGCCTTC AACTGCCCTT CCACTTTTTT CAAGTCTGAA 780 

ATGGTCTTCA AGCCATATTC TTGAGCAATC TTTTTCGGAA CAGCTACAGC ATAGGTGTTT 840 

TGATAAGACA TGGGTTTGAG ATAGGCTAGA TGATCCTGCT TAGCAATGCC ATCACGCGCC 900 

ACCTGATAAA CCTGTTCTGG TTCATGACTC ACCTTGGGTG ATGGTTGAAG CAAACTTTCA 960 

GTCACCGTAC CAGTAAATTC AGGATAGATG TCAATATCGC CTTTTTTCAG AGCTTCATAA 102 0 

AGGAAGCTTG TCTTCCCAAA ATTCGGTTTA ACAGTCGCAG TCATGCTGGT ATTTTCTTCA 1080 

ATCAGCAACT TATACATATT GGCCAAAATT TCTGGTTCTG GACCTATTTT CCCAGCAATA 1140 

ACCAAGTTTT CCTTCTCTTT TTGAACCAAA AGAGCTGGAC TATAAGACAG ACCCAGTAAT 12 00 

AAAGCCACCA AGGCAAAACC TGAGAAAATC GTCCGTAATT TTGCTTTTTC CATCACTTTT 12 60 

AGTAGGAAGT TAAAGGCAAT GGCTAGCACT GCAGAAGAAA GTGCCCCAAT CAAAATCAAA 13 20 

CTGGCATTAT TACGGTCAAT TCCCAAAAGA ATAAAGGAAC CTAGTCCCCC TGCACCAATC 13 80 

AAGGCCGCCA AGGTTGCCGT ACCGATAATC AAAACAGCTG CCGTCCGAAT CCCAGACATG 1440 

ATAACAGGCA TGGCGAGTGG AATTTCAAAT TTCTTGAGAC GTTCCCATCT GGTCATCCCA 1500 

AAGGCAATCC CAGCCTCTTG CAGGTTCGGA TCAATTCCCT TCAGCCCAGT GATAGTATTT 15 60 

TGCAAAATAG GGAAAATCGC ATAAATCACT AGAGCTGTCA AAGCCGGCAA GGTCCCAATT 1620 

CCCATCAAAG GGATAAAGAG CCCCAACAAG GCCAGAGACG GGATGGTCTG GAAAATACCT 1680 

GCAATCTGCA AGACCCAGTC GGCCAGCTTC TCATGATAGC GAAGAAAAAC AGCCAAGGGA 1740 
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ATCGCAAGCA AAATAGCTAG TAACAAGGTC AAAAGCGACA ACTGCAAATG TTGAGATAGA 18 00 

GCTGTCAACC AAT C ACT AAA ACGATCCTGA AAAGTTGCAA TTAAATTAGT CATGAACACT 18 60 

ACCTCCAAAC AAGTCTGCTA CAAAGTCTGT TGCAGGCGCT TTTAAAATTG TCTCGGGATT 192 0 

CGCTACCTGG CGAATTTCTC CATCCTGCAA GACAGCAATA CGGTCCGCCA ACTTCAAGGC 19 80 

TTCATCCGTA TCATGGGTTA CAAAAATCGT TGTCATCCCA AACTCTTTAT GCAATTCTTT 2 040 

TGTCAGAACC TGCAACTGTT TTCTCGAAAT AGCATCCAAG GCCGAAAAGG GTTCATCCAT 2100 

GAGGAAAATC TTGGGCTGAC CAATCATAGC TCGGACAATA CCGACCCGTT GCTGTTCTCC 2160 

AC C AG AT AAT TCACTAGGTA AGCGATGCCC ATACTCGGCT ACTGGTAAAC CAACCTTAGC 222 0 

CAAAAGCTCT TCTGTTTTCT TCGTAATTTC TTCCTTGCTC CACCCCTTCA TTTCAGGAAT 2280 

GAGAGCAATA TTTTCCGCAA CTGTTAGATT TGGAAAAAGA GCAATAGCCT GTAAAACATA 2340 

ACCAGTAGAA AGACGAAGTT CACGCTCATC ATAGTCTTTG ATGCGCTTCC CATCCATATA 2400 

AATATTTCCA TCAGTTGGTT CCAAAAGACG GTTAATCATC TTGAGCATGG TCGTCTTACC 246 0 

TGACCCAGAA GGCCCTACTA AAACCATAAA TTCCCCATCC TCAATCTGTA AGTTGACATC 252 0 

TCTCAAGACA TCCTTTTCTG TGTAGCGCAG TGCTACATTT TTGTATTCAA TCATTCTTTG 258 0 

TCCTCAATTT AAAACTTCCC TCGATTGGTC AAGTCTTCTA CCTTAGGCAT AACTTCCTTA 2640 

TTATCCCAAT GCTCCACAAT TTTCCCGTTC TCTAAACGGA AGATATCGTA CTGGGCATAA 2700 

GCAACGCCAT CAATCTGAGT CTGACCATAG CTAACCACAT AGTTTCCTTG TCCTAAGAGT 2 7 60 

TGGAAAACAA AGTCAAAAGT GACACTATAT TCAGCCACAT AGTTTTTATA AGCAGCACTT 2 820 

CCTTGTCCAA TATCATGATT ATGCTGAATC AAATCGTCTG CCACATAATC ACTCCACTGC 2 880 

TCTAGCTCCC CATTTTGGAA AATTTCTGTC AAGAAACGGC GAACCAGCTT TTTATTTTCT 2 940 

GCTTTCTTAT CCAAATCCTT GATTTCAAAA TCTCCAAAAA TTTGATCTAG TTGGTCATTT 3 000 

TCAGGTGTTC GATAGTAGTC AATGACATCC CAATGCTCAA CAATACAACC ATTCTCATCC 3 060 

TCACGGAAAG TATCCGTCGT CACCCATTGA GCTTCTCCAC CATTCAGATA TTGATGAACA 3120 

TGAACAAAGA CCAGATTGCC ATCCTCAATG GTGCGGACAA TCTTAATCTG ACGCTCTGGA 3180 

TGACGCTCAA AGAAATCTGC AAAGAAGGCT GCAAATCCTT CTTTCCCGTC AGGAACACCT 3240 

GTCGAATGTT GGATATAGGT ATCCCCTACA GACTGGGCTT GAGCCTCAGC AACTCGTCCG 3300 

TCTTGAATGG CATGGATGTA TAGGTTGTGA GCATTTTTCA CTTGTTGTGA CATATTCTAA 3 3 60 

ACCTCATTTC CCTTCTCTTT CAGATTCGCC AAAATTCTTT CTTGAAAACC TTCAAATTGG 3420 

TGAATTTCTT CCTCTGAAAA TCCTTTGTAA AAGATAGTAT CCAATTTCTG ACTGACACGA 34 80 
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TGCCCCACTT CTTTCTGGGA CTTGCCTAAC TCCGTTAAAA CTAAATACTT CTTACGCTTG 354 0 

TCTTTTCCAC ACGGACTAAC AATTACAAGC TTTTGTTCCT CTAGCTTTTT TATCATAGTC 3600 

GTCAGCGTAT TATTCGCAAG TCCAGTCGCA AGCGCGATAT CTGTCGCAGT TGCGCAGCCA 3660 

GTTTCACTAT TCCATAAAAC CGCTAAAATC TTGCCCTGTT CACCCCTATA AAGAGCCTCA 372 0 

GGATCTTGAC TCAGTAACTT TTGAAAAATC CGCCCATTCA ACAAACGAAT ATGATGGGCT 3 7 80 

AGCAAATGAC CATCTTTCAT AACACCTCCA ATTTATTTCG ATATCGAAAT GAATAAAACA 3 84 0 

ATTGTAACAC TCATCGTTCT AACTGTCAAC TATTTCGATT TAGAAATAAT TTTTGATAAT 3900 

TATCCACACC ACCATACTCC GGCTCAACTA ACTTTTAACG AGAGTTTCTA AACTCCTTCG 3960 

TCCTCCAGTC TACAAAAGCC TTCCATTCGT ACTATCCTAT ATTTTATGAG GGGACACATT 402 0 

TTTCCTATCA GACCATTTAT TTTAAAGATA GAAGTAAATC ATAATTGCTT CCATCTGTTC 4080 

TTTTATAGTA TATTGAAGTT AGACTAGAGC ACTGTATCTT CTAAAACATT GATAGAAAGC 414 0 

GATTTGAATT TCCCAATCAA TTTGTTCGTA TTTATAGCAT TTCGAAACTG GAATAGGACA 42 00 

CCATGACTGC TAAAAGATTT CTATAAATTC ATTTAATTTC CTCAATCAAT TTGTTCATAT 42 60 

CTTATTTCAT TCCGCTATAA TTTCACCTTA CCCTATCTTT TTCGTAGCAC CCTTCAAACA 4 32 0 

GCCTATCCCC TACCGTTTGA CGATTCCTCA CTTCGCTCCA CTTCCATTAC AGAAGTTTCT 43 80 

TCACTACTAT GGGCTCGGCT GACTTCTCAT GATTCCTTGT TACTACTATT TGAACGCTCA 4440 

CGAGATAGAT CTTACAAAAA ATGCTTTGAT CCACAATGGA ATCAAAGCAT TTTAAAGAGT 4500 

TCCTCATACA TAAGCGCAGA AGTCGCAGTT CCTCTGTACT TGGCTTCTTC TCTTTTGACA 4560 

AAGCGAGCCA AGTTGAGCAA CTCAGGTGCT GGATGTTTGG GATTTAGGAG CAATTCACGA 462 0 

TTGACCAGGC CTGAGAGACG AACTGCCTGC AATTGCTCAT TTGTAGTAGG CAGTTTTTTA 4680 

GTAGTCTCTA GGAGAGCAGC AACTAAATCT TCACTCAAAT CATGTCGAGC ATGATTGTAA 474 0 

AGATCTTTTA TAAGGCTTTC TAGGTTTGGT TCTACCATCC CTACCACCTC CCTTATGGTT 4 800 

TAATAATGTT TAATCAAATC AACCGTTGAA CGATCCAATT TCTTCACCAA GGCTTGTAAG 48 60 

AAAGCTTGCG CTTCTAGGAA GTCATCCATT GCATAGAGGG TTTGGTGAGA ATGGATATAA 492 0 

CGAGCGCAGA CACCGATAGT TGTTGATGGG ACACCACCAT TTTTCAGATG AGCTGCACCT 4980 

GCATCTGTTC CGCCTTTACC ACAGTAGTAT TGGTACTTGA TACCAGCTTC TTCAGCCGTT 504 0 

GTCAAAAGGA AATCCTTCAT CCCTGGGAGA AGCAAGTGAC CTGGATCATA GAAACGAATC 5100 

AAGGTTCCAT CTCCAATCTT GCCTTGACCA CCGTAGACAT CACCTGCTGG TGAGCAATCA 5160 

ACTGCGAGGA AGACTTCTGG GTCAAACTTG C-TTGTAGAGG TATGAGCGCC ACGCAGACCA 5220 

ACTTCTTCTT GGACGTTAGA ACCCAGATAG AGTTCATTGC CGAGTTTTTG ACCCGATAAA 5280 
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GCTTCAGCTA GCTCGCTTAC CATGAGGACA CCGTAGCGGT TATCCCAAGC TTTTGAGATG 5340 

ATATTTTTTT CATTGGCTGT CAAAATTGCA GAACTATCTG GTACAATGGT ATCACCAGGA 54 00 

CGGATGCCAA AACTTTCTGC CTCAGCCTTG TCCGCAAAAC CACCATCAAA AACGATATCG 5460 

GCAATGGCTG GCATGGTTGG TCCCCCCTTT CCACGAGTCA AATGCGGAGG AACAGAACCT 5520 

GAAATCACAG GAATTTCATG ACCATCACGA GTCAAGAGTT TGAAACGTTG GCTGCTAACC 5580 

ACCATGGGGT TCCAGCCACC GATTTCTACG ACACGGAAGG TACCATCTGG CTTGATTTCG 5640 

CTGACCATAA AACCAACTTC GTCCATATGA GAAGCGACCA AGACGCGCGG TGCATCCACA 5700 

GCTTCTGAAT GTTTGATACC AAAAATACCA CCCAAGCCAT CTGTCACCAC TTCATCCACA 57 60 

TGCGGTGTCA ACTTTTCACG AAGATAAGCA CGGACAGGCG CTTCATGACC TGAGACTGCA 582 0 

GCAAGTTCTG TTACTTCTTT AATTTTTGAA AATAATGTTG TCATTTCAGT TCCTTCTTTC 5880 

TTTCATCCAT TTTACCACTT TTTATAGGAG AAGGATAGTG GGAAGGTGGA TTTCTAAGTT 594 0 

AGTATCTTAG TCCTGCTCTA TCTTAGAAAA GGATAGTATT CTCTTGCATG TAGTGCAAAA 600 0 

TCTAGTAAAC ATTCCAAAAT TAACTCGAAT ATTTATTTCC AAACAAAAAA ACAATACACC 6060 

ATCAAAGTTG TTTGGATTTT TCATGAAATT TACAGAAAAT AGTTGACTTC CCTTTCTTCT 6120 

TTCTTTAAAT ATATAGTTGG TTGAGTTTGG AATAGTACGC TGTAGCTGCT AAAACATTTC 6180 

TAGAAATTAA TTTGACTTTC CTAATAGAGT TGTTCATATC TTATTTCAAT TTACTATAGT 6240 

ACAAAACTAG AAAAGGAAAA AATCATGACC AGG 6273 
(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 28171 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 

ACAACCTTTT TCAAAAACTC ACCTTGGTAC GGAGATGTTT TGCTTTCTGC TATTATTTTC 60 

GGTTATATTC ATATCAATTT TGCTTTAACT CCTCTTGCTT TTTTCATTTA TGCTAGTGGA 120 

GGTCTTATTT TAGCTCTATT GTATCGCATG ACTAAAAATC TCTACTATCC AATACTAGTT 180 

CATATTCTCA TTAATATCAC TGCCTTCTGG GATGTGTGGT TGCTCCTATT TTCAGGAAGT 2 40 

TAGCTTACTA AAATAATGTC GGAACTTTCC GGCATTTTCT TTTTTCACAA ATAGTCAACG 300 

TTTTTCTTTT CGATATTGTA GTGGTGTGTA TCCAGTTATT TTTTTGAATT GATTTTGAAA 3 60 
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ATAAGGTTGA CTTGAGAAAG GCAGATAGTG AAGATAGTTA AGAAGAATAG GATGTTCTTT 42 0 

TTTCCTTTTT GGAAAACTTC TAAAATATGG TATAATGAAA AGATAAAGAA GTTGGGGGTA 480 

GAAGATGAAC ATTCAACAAT TACGCTATGT TGTGGCTATT GCCAATAGTG GTACTTTTCG 540 

TGAAGCTGCT GAAAAGATGT ATGTTAGTCA GCCGAGTCTG TCTATTTCTG TTCGTGATTT 600 

GGAAAAAGAG TTGGGCTTTA AGATTTTCCG TCGGACCAGC TCAGGGACTT TCTTGACCCG 660 

TCGTGGGATG GAATTTTATG AAAAATCGCA AGAATTGGTT AAAGGATTTG ATATTTTTCA 720 

AAATCAGTAT GCCAATCCTG AAGAAGAAAA AGATGAATTT TCTGTTGCTA GCCAGCACTA 780 

TGACTTCTTG CCACCAACTA TTACGGCCTT TTCAGAGCGC TATCCTGACT ATAAGAACTT 84 0 

CCGTATTTTT GAATCAACTA CTGTTCAAAT ATTAGATGAA GTGGCGCAAG GGCATAGTGA 900 

GATTGGGATT ATCTACCTCA ACAATCAAAA TAAAAAGGGG ATTATGCAAC GGGTTGAAAA 9 60 

ATTAGGTCTG GAGGTCATCG AATTGATTCC TTTCCATACC CATATTTATC TCCGTGAGGG 102 0 

TCATCCTTTA GCCCAGAAAG AGGAATTAGT CATGGAGGAT TTAGCGGATT TACCAACGGT 10 80 

TCGTTTCACT CAAGAGAAAG ACGAGTACCT TTATTATTCA GAGAACTTTG TCGATACCAG 114 0 

CGCTAGCTCA CAGATGTTTA ATGTGACAGA CCGTGCCACC TTGAATGGTA TTTTGGAGCG 1200 

GACGGACGCC TATGCGACAG GTTCTGGATT TTTAGATAGT GACAGTGTTA ATGGCATTAC 12 60 

AGTTATTCGT CTCAAGGATA ACCTAGATAA CCGCATGGTC "ATGTTAAAC GTGAAGAAGT 1320 

GGAGCTTAGT CAAGCTGGGA CTCTCTTCGT AGAAGTCATG CAAGAATATT TTGATCAAAA 1380 

GAGGAAATCA TGAAAAAAAG AGCAATAGTG GCAGTCATTG TACTGCTTTT GATTGGGCTG 1440 

GATCAGTTGG TCAAATCCTA TATCGTCCAG CAGATTCCAC TGGGTGAAGT GCGCTCCTGG 1500 

ATCCCCAATT TCGTTAGCTT GACCTACCTG CAAAATCGAG GTGCAGCCTT TTCTATCTTA 1560 

CAAGATCAGC AGCTGTTATT CGCTGTCATT AC'l'CTGGTTG TCGTGATAGG TGCCATTTGG 162 0 

TATTTACATA AACACATGGA GGACTCATTC TGGATGGTCT TGGGTTTGAC TCTAATAATC 1680 

GCGGGTGGTC TTGGAAACTT TATTGACAGG GTCAGTCAGG GCTTTGTTGT GGATATGTTC 1740 

CACCTTGACT TTATCAACTT TGCAATTTTC AATGTGGCAG ATAGCTATCT GACGGTTGGA 1800 

GTGATTATTT TATTGATTGC AATGCTAAAA GAGGAAATAA ATGGAAATTA AAATTGAAAC 1860 

TGGTGGTCTG CGTTTGGATA AGGCTTTGTC AGATTTGTCA GAATTATCAC GTAGTCTCGC 1920 

GAATGAACAA ATTAAATCAG GCCAGGTCTT GGTCAATGGT CAAGTCAAGA AAGCTAAATA 1980 

CACAGTCCAA GAGGGTGATG TCGTCACTTA CCATGTGCCA GAACCAGAGG TATTAGAGTA 2 040 

TGTGGCTGAG GATCTTCCGC TAGAAATAGT CTACCAAGAT GAGGATGTGG CTGTCGTTAA 2100 

CAAACCTCAG GGAATGGTTG TGCACCCGAG TGCTGGTCAT ACCAGTGGAA CCCTAGTAAA 2160 
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TGCCCTCATG TATCATATTA 
TGTTCACCGT ATTGATAAGG 
GCATCTAGCA CTTGCCCAAG 
TGTTCATGGA AATCTACCTA 
AAAAGACCGT AAGAAACAGG 
CGTCTTGGAA CGCTTTGGCG 
TCATCAAATC CGTGTCCACA 
TGGTCCTCGC AAGACTTTGA 
TACTCATCCG AGAACAGGTA 
GGAAACCTTG GAGAGATTGA 
GTAGGCGCTT TTTTAGGTTT 
AATAAAAT C C ACTTTATCAA 
AATGGACATT TTGCCATGGT 
TCTCGCTATC CATGGAGAGA 
GTCTTTCGTC GTTTGAAGGA 
ACCCACAGTG ATCATATTGG 
GTCTATCTTA AGAAATATAG 
CTGTATGGCT ATGATAAGGT 
AATATCACAC AAGGGGATGC 
TATGAAAATG AAACTGATTC 
TCCTTGATTA GCGTGGTGAA 
AATGTTCATG GAGCAGAAGA 
TTTAATCATC ACCATGATAC 
CCGAGTTTGA TTGTTCAAAC 
TATGTTAATT GGCTCAAAGA 
GATGCAACAG TTTTTGATAT 
CCGATTCCAA GTTTTCAAGC 
GCGCCTGATT CTACAGGAGA 
TACTTTAACC AAACGGGTAT 
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AGGACTTGTC GGGTATCAAT GGGGTTCTGC GTCCAGGGAT 222 0 

ATACGTCAGG TCTTCTCATG ATTGCTAAAA ACGATGATGC 22 80 

AACTCAAGGA TAAAAAGTCT CTCCGCAAAT ATTGGGCGAT 23 4 0 

ATGATCGTGG TGTAATTGAA GCGCCGATTG GCCGGAGTGA 24 00 

CTGTAACTGC TAAAGGGAAG CCTGCAGTGA CGCGTTTTCA 24 60 

ATTATAGCTT AGTAGAGTTG CAACTGGAGA CAGGGCGCAC 252 0 

TGGCTTATAT CGGCCATCCA GTCGCTGGTG ATGAGGTCTA 2580 

AAGGACATGG ACAATTTCTT CATGCCAAGA CTTTAGGTTT 2640 

AGACCTTGGA ATTTAAAGCA GATATCCCAG AGATTTTTAA 27 00 

GAAAGTAAGA ATGAAAAAGA AATTAACTAG TTTAGCACTT 27 60 

GTCATGGTAT GGGAATGTTC AGGCTCAAGA AAGTTCAGGA 2 820 

TGTTCAAGAA GGTGGCAGTG ATGCGATTAT TCTTGAAAGC 2880 

GGATACAGGA GAAGATTATG ATTTCCCAGA TGGAAGTGAT 2 94 0 

AGGAATTGAA ACGTCTTATA AGCATGTTCT AACAGACCGT 3000 

ATTGGGTGTC CAAAAACTTG ATTTTATTTT GGTGACCCAT 3060 

AAATGTTGAT GAATTACTGT CTACCTATCC AGTTGACCGA 3120 

TGATAGTCGT ATTACTAATT CTGAACGTCT ATGGGATAAT 3180 

TTTACAGACT GCTGCAGAAA AAGGTGTTTC AGTTATTCAA 324 0 

TCATTTTCAG TTTGGGGACA TGGATATTCA GCTCTATAAT 3300 

ATCGGGTGAA TTAAAGAAAA TTTGGGATGA CAATTCCAAT 3 3 60 

AGTCAATGGC AAGAAAATTT ACCTTGGGGG CGATTTAGAT 342 0 

CAAGTATGGT CCTCTCATTG GAAAAGTTGA TTTGATGAAG 34 80 

CAACAAATCA AATACCAAGG ATTTCATTAA AAATTTGAGT 3 540 

TTCGGATAGT CTACCTTGGA AAAATGGTGT TGATAGTGAG 3600 

ACGAGGAATT GAGAGAATCA ACGCAGCCAG CAAAGACTAT 3 6 60 

TCGAAAAGAC GGTTTTGTCA ATATTTCAAC ATCCTACAAG 372 0 

TGGTTGGCAT AAGAGTGCAT ATGGGAACTG GTGGTATCAA 37 8 0 

GTATGCTGTC GGTTGGAATG AAATCGAAGG TGAATGGTAT 3840 

CTTGTTACAG AATCAATGGA AAAAATGGAA CAATCATTGG 3900 



WO 98/18931 



PCT/US97/19588 



276 

TTCTATTTGA CAGACTCTGG TGCTTCTGCT AAAAATTGGA AGAAAATOGO TGGAATCTGG 3 9 60 

TATTATTTTA ACAAAGAAAA CCAGATGGAA ATTGGTTGGA TTCAAGATAA AGAGCAGTGG 4020 

TATTATTTGG ATGTTGATGG TTCTATGAAG ACAGGATGGC TTCAATATAT GGGGCAATGG 4080 

TATTACTTTG CTCCATCAGG GGAAATGAAA ATGGGCTGGG TAAAAGATAA AGAAACCTGG 4140 

TACTATATGG ATTCTACTGG TGTCATGAAG ACAGGTGAGA TAGAAGTTGC TGGTCAACAT 4200 

TATTATCTGG AAGATTCAGG AGCTATGAAG CAAGGCTGGC ATAAAAAGGC AAATGATTGG 42 60 

TATTTCTACA AGACAGACGG TTCACGAGCT GTGGGTTGGA TCAAGGACAA GGATAAATGG 4320 

TACTTCTTGA AAGAAAATGG TCAATTACTT GTGAACGGTA AGACACCAGA AGGTTATACT 4 380 

GTGGATTCAA GTGGTGCCTG GTTAGTGGAT GTTTCGATCG AGAAATCTGC TACAATTAAA 4440 

ACTACAAGTC ATTCAGAAAT AAAAGAATCC AAAGAAGTAG TGAAAAAGGA TCTTGAAAAT 4 500 

AAAGAAACGA GTCAACATGA AAGTGTTACA AATTTTTCAA CTAGTCAAGA TTTGACATCC 4 560 

TCAACTTCAC AAAGCTCTGA AACGAGTGTA AACAAATCGG AATCAGAACA GTAGTAGAAA 4 520 

AGAAGGTTTT AGGGCCTTCT TTTTCCTATC AACTCTTTTC TATTTCCTGT TATTCATGTT 4 680 

ATAATGGATA AATATGAATA ATCGGAGTGA GACTATGAAA TACAAACGGA TTGTCTTTAA 4 740 

GGTGGGTACT TCTTCTCTGA CAAATGAGGA TGGAAGTTTA TCACGTAGTA AGGTAAAGGA 4 800 

TATTACCCAG CAGTTGGCTA TGCTGCACGA GGCTGGTCAT GAGTTGATTT TGGTGTCTTC 4860 

AGGTGCCATT GCGGCTGGTT TTGGAGCCTT AGGATTTAAA AAGCGTCCGA CTAAGATTGC 4 920 

TGATAAACAG GCTTCAGCAG CGGTAGGGCA GGGGCTTTTG TTGGAAGAAT ATACAACCAA 4 9 80 

TCTTCTCTTG CGTCAAATCG TTTCTGCACA AATCTTGCTG ACCCAAGATG ACTTTGTGGA 5040 

TAAGCGTCGT TATAAAAATG CCCATCAGGC TTTGTCGGTT TTGCTCAACC GTGGGGCAAT 5100 

TCCTATCATC AATGAGAATG ATAGTGTCGT TATTGATGAG CTCAAGGTTG GGGACAATGA 5160 

CACTCTAAGT GCTCAAGTAG CGGCGATGGT CCAAGCAGAC CTTTTAGTTT TCTTGACAGA 52 20 

TGTGGACGGT CTCTATACTG GAAATCCTAA TTCAGATCCA AGAGCCAAAC GCTTGGAGAG 52 80 

AATCGAGACC ATCAATCGTG AGATTATTGA TATGGCTGGT GGAGCTGGTT CGTCAAACGG 53 40 

AACTGGGGGT ATGTTAACCA AAATCAAGGC TGCAACTATC GCGACGGAAT CAGGAGTTCC 5400 

TGTTTATATC TGCTCATCCT TGAAATCAGA TTCCATGATT GAGGCGGCAG AGGAGACCGA 54 50 

GGATGGTTCT TACTTTGTTG CTCAAGAGAA GGGGCTTCGT ACCCAGAAAC AATGGCTTGC 5520 

CTTCTATGCT CAGAGTCAAG GTTCTATTTG GGTTGATAAA GGGGCTGCGG AAGCTCTCTC 5580 

TCAATATGGA AAGAGTCTTC TCTTATCTGG TATCGTTGAA GCAGAAGGAG TCTTTTCTTA 5640 

CGGTGATATC GTGACAGTAT TTGACAAGGA AAGTGGAAAA TCACTTGGAA AAGGACGCGT 5700 
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GCAATTTGGA GCATCTGCTT TGGAGGATAT GTTGCGTTCT CAAAAAGC C A AGGGTGTCTT 57 60 

GATTTACCGT GACGACTGGA TTTCCATTAC TCCTGAAATC CAACTACTTT TTACAGAATT 5820 

TTAGAGGTAA ACTATGGTGA GTAGACAAGA ACAATTTGAA CAGGTACAGG CTGTTAAAAA 5 880 

ATCGATTAAC ACAGCTAGTG AAGAAGTGAA AAACCAAGCC TTGCTAGCCA TGGCTGATCA 5940 

CTTAGTGGCT GCTACTGAGG AAATTTTAGC GGCTAATGCC CTCGATATGG CAGCGGCTAA 6000 

GGGGAAAATC TCAGATGTGA TGTTGGATCG TCTTTATTTG GATGCAGATC GTATAGAAGC 6060 

GATGGCAAGA GGAATTCGTG AAGTGGTTGC CTTACCAGAT CCAATCGGTG AAGTTTTAGA 6120 

AACAAGTCAG CTTGAAAATG GTTTGGTTAT CACAAAAAAA CGTGTAGCTA TGGGTGTCAT 6180 

CGGTATTATC TATGAAAGCC GTCCAAATGT GACGTCTGAT GCGGCTGCTT TGACTCTTAA 62 40 

GAGTGGAAAT GCGGTTGTTC TTCGTAGTGG TAAGGATGCC TATCAAACAA CCCATGCCAT 63 00 

TGTCACAGCC TTGAAGAAGG GCTTGGAGAC GACTACTATT CATCCAAATG TGATTCAACT 63 60 

GGTGGAGGAT ACTAGCCGTG AAAGTAGTTA TGCTATGATG AAGGCCAAGG GCTATCTAGA 64 2 0 

CCTTCTCATT CCTCGTGGAG GAGCTGGCTT GATCAATGCA GTGGTTGAGA ATGCGATTGT 64 80 

ACCTGTTATC GAGACAGGGA CTGGGATTGT CCATGTCTAT GTGGATAAGG ATGCAGACGA 654 0 

AGACAAGGCG CTGTCTATCA TCAACAATGC TAAAACCAGT CGTCCTTCTG TTTGTAATGC 6600 

CATGGAGGTT CTGCTGGTTC ATGAAAACAA GGCAGCAAGC TTCCTTCCTC GCTTGGAGCA 6660 

AGTGTTGGTT GCAGAGCGTA AGGAAGCTGG ACTGGAACCA ATTCAATTCC GCCTAGATAG 6720 

CAAAGCAAGC CAGTTTGTTT CAGGTCAAGC AGCTGAGACC CAAGACTTTG ACACCGAGTT 6780 

TTTAGACTAT GTCCTTGCTG TTAAGGTTGT GAGCAGTTTA GAAGAAGCGG TTGCGCACAT 684 0 

TGAATCCCAC AGCACCCATC ATTCGGATGC TATTGTGACG GAAAATGCTG AAGCTGCAGC 6900 

ATACTTTACA GATCAAGTGG ACTCTGCAGC GGTGTATGTT AATGCCTCAA CTCGTTTCAC 69 60 

AGATGGAGGA CAATTTGGTC TTGGTTGTGA AATGGGGATT TCTACTCAGA AATTGCACGC 7020 

GCGTGGTCCC ATGGGCTTGA AAGAGTTGAC CAGCTACAAG TATGTGGTTG CCGGTGATGG 708 0 

GCAGATAAGG GAGTAAGAGA TGAAGATTGG ATTTATCGGT TTGGGGAATA TGGGTGCTAG 7140 

CTTGGCAAAA TCTGTCTTGC AGACTAGGAC GTCAGATGAG ATTCTCCTTG CCAATCGTAG 72 00 

TCAAGCTAAG GTAGATGCTT TCATTGCAGA CTTTGGTGGT CAGGCTTCCA GCAATGAAGA 7260 

AATGTTTGCA GAAGCAGATG TGATTTTTCT AGGAGTTAAC- CCTGCTCAGT TTTCTGAACT 7320 

GCTTTCTCAA TACCAGACCA TCCTTGAAAA AAGAGAAAGT CTTCTTTTGA TTTCGATGGC 7380 

AGCTGGATTG ACCTTAGAAA AACTAGCAAG TCTTATCCCA AGTCAAC AC C GAATTATTCG 744 0 
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TATGATGCCT AATACCCCTG CTTCTATCGG GCAAGGAGTG ATTAGTTATG CCTTGTCTCC 7 500 

TAATTGCAGG GCTGAGGACA GTGAGCTCTT TTATCAGCTT TTAGCCAAGG CTGGTCTCTT 75 60 

GGTTGAACTA GGAGAAAGTT TAATCGATGC AGCGACAGGT CTTGCAGGTT GTGGACCAGC 7 62 0 

CTTTGTCTAT CTTTTTATCG AGGCCTTGGC AGATGCAGGT GTTCAGACAG GATTACCACG 7 6 80 

AGAAAT AG C A TTGAAAATGG CAGCACAAAC TGTGGTAGGA GCTGGGCAAT TGGTCCTTGA 774 0 

AAGTCAGCAA CATCCTGGAG TATTGAAAGA CCAAGTCTGT AGCCCAGGCG GTTCGACTAT 7 800 

CGCTGGTGTA GCAAGCCTAG AAGCGCATGC TTTCCGAGGA ACAGTCATGG ATGCAGTTCA 78 60 

TCAAGCCTAC AAACGAACAC AAGAACTAGG TAAATAAGAG GTAGTTTTGA CTGCCTCTTT 792 0 

TATGGTGGCT GAAATGAGAA GACACAAAAA GATTGTCACA AACCCCTATT TTTTTGATAG 7980 

AATAGAAGTA GTAAAAAAGA AATGAGTTAG ACATGTCAAA AGGATTTTTA GTCTCTCTTG 804 0 

AGGGACCAGA GGGAGCAGGC AAGACCAGTG TTTTAGAGGC TCTGCTACCA ATTTTAGAGG 8100 

AAAAAGGAGT AGAGGTGTTG ACGACCCGTG AACCTGGCGG AGTCTTGATT GGGGAGAAGA 8160 

TTCGGGAAGT GATTTTGGAT CCAAGTCATA CTCAGATGGA TGCTAAAACA GAGCTACTTC 822 0 

TCTATATTGC CAGTCGCAGA CAGCATTTGG TGGAAAAAGT TCTTCCAGCC CTTGAAGCTG 82 80 

GCAAGTTGGT CATCATGGAT CGTTTTATCG ATAGTTCTGT TGCCTATCAG GGATTTGGTC 8340 

GTGGCTTAGA TATTGAAGCC ATTGACTGGC TCAATCAGTT TGCGACAGAT GGCCTCAAAC 8400 

CCGATTTGAC ACTCTATTTT GACATCGAGG TGGAAGAAGG GCTGGCTCGT ATTGCTGCTA 8460 

ATAGTGACCG CGAGGTTAAT CGTTTGGATT TGGAAGGGTT GGACTTGCAT AAAAAAGTTC 8520 

GTCAAGGCTA CCTTTCTCTT CTGGATAAAG AGGGAAATCG CATTGTCAAG ATTGATGCTA 8580 

GTCTCCCTTT GGAGCAAGTT GTGGAAACTA CCAAGGCTGT CTTGTTTGAC GGAATGGGCT 864 0 

TGGCCAAATG AAACAAGATC AACTAAAGGC TTGGCAACCA GCTCAGTTTG ACCGTTTTGT 8700 

CCGTATCTTA GAACAAGACC AGCTCAATCA CGCCTATCTC TTTTCAGGTT TCTTTGAAAG 876 0 

CTTGGAAATG GCGCAATTTT TAGCTAAGAG CCTCTTTTGT ACGGATAAAG TTGGCGTCTT 882 0 

ACCATGTGAG AAATGCCGAA GTTGCAAGCT GATTGAACAG GGAGAATTTC CCGATGTCAC 888 0 

CTTGATTAAA CCAGTTAATC AGGTCATTAA GACGGAACGC ATTCGAGAAT TGGTGGGTCA 8940 

GTTTTCTCAA GCAGGGATTG AAAGCCAGCA ACAGGTCTTT ATCATCGAGC AAGCGGATAA 9000 

AATGCATCCC AACGCAGCCA ATTCTCTGCT CAAGGTCATC GAAGAACCCC AGAGTGAAGT 9060 

TTATATTTTC TTCTTGACTA GCGATGAGGA AAAGATGTTA CCGACAATCC GAAGTCGGAC 9120 

TCAGATCTTC CACTTTAAAA AGCAAGAAGA AAAACTTATC TTACTCTTAG AACAAATGGG 9180 

ACTTGTTAAG AAAAAAGCGA CTCTTTTAGC TAAGTTTAGT CAATCGCGAG CTGAAGCAGA 9240 
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AAAGTTGGCT AATCAGGCAA GTTTTTGGAC CTTGGTCGAT GAAAGTGAAC GCCTGCTGAC 93 00 

TTGGTTAGTA GCTAAGAAAA AAGAAAGTTA TCTACAGGTT GCCAAATTAG CCAACTTGGC 93 60 

AGATGATAAG GAAAAACAGG ATCAGGTTTT ACGGATTCTT GAAGTTCTCT GTGGGCAGGA 9420 

CCTCTTGCAG GTAAGAGTAA GAGTGATTCT ACAAGATTTA CTAGAAGCTA GAAAAATGTG 9480 

GCAAGCTAAT GTCAGCTTTC AAAATGCCAT GGAATATCTG GTCTTGAAAG AAATATAAAC 9 540 

TCAAAAATGA ATGATAAAGA AAGGAAAGGG CTGTTTTATG GACAAAAAAG AATTATTTGA 9600 

CGCGCTGGAT GATTTTTCCC AACAATTATT GGTAACCTTA GCCGATGTGG AAGCCATCAA 9 660 

GAAAAATCTC AAGAGCCTGG TAGAGGAAAA TACAGCTCTT CGCTTGGAAA ATAGTAAGTT 9720 

GCGAGAACGC TTGGGTGAGG TGGAAGCAGA TGCTCCTGTC AAGGCCAAGC ATGTTCGTGA 97 80 

AAGTGTCCGT CGCATTTACC GTGATGGATT TCACGTATGT AATGATTTTT ATGGACAACG 9840 

TCGAGAGCAG GACGAGGAAT GTATGTTTTG TGACGAGTTG CTATACAGGG AGTAGGCATG 9900 

CAGATTCAAA AAAGTTTTAA GGGGCAGTCT CCCTATGGCA AGCTGTATCT AGTGGCAACG 9 9 60 

CCGATTGGCA ATCTAGATGA TATGACTTTT CGTGCTATCC AGACCTTGAA AGAAGTGGAC 10020 

TGGATTGCTG CTGAGGATAC GCGCAATACA GGGCTTTTGC TCAAGCATTT TGACATTTCC 10080 

ACCAAGCAGA TCAGTTTTCA TGAGCACAAT GCCAAGGAAA AAATTCCTGA TTTGATTGGT 10140 

TTCTTGAAAG CAGGGCAAAG TATTGCTCAG GTCTCTGATG CCGGTTTGCC TAGCATTTCA 102 00 

GACCCTGGTC ATGATTTAGT TAAGGCAGCT ATTGAGGAAG AAATTGCAGT TGTGACAGTT 102 60 

CCAGGTGCCT CTGCAGGAAT TTCTGCCTTG ATTGCCAGTG GTTTAGCGCC ACAGCCACAT 10320 

ATCTTTTACG GTTTTTTACC GAGAAAATCA GGTCAGCAGA AGCAATTTTT TGGCTTGAAA 103 80 

AAAGATTATC CTGAAACACA GATTTTTTAT GAATCACCTC ATCGTGTAGC AGACACGTTG 10440 

GAAAATATGT TAGAAGTCTA CGGTGACCGC TCCGTTGTCT TGGTCAGGGA ATTGACCAAA 10500 

ATCTATGAAG AATACCAACG AGGTACTATC TCTGAGTTAT TAGAAAGCAT TGCTGAAACG 105 60 

CCACTCAAGG GCGAATGTCT TCTCATTGTT GAGGGTGCCA GTCAGGGTGT GGAGGAAAAG 10620 

GACGAGGAAG ACTTGTTCGT AGAAATTCAA ACCCGCATCC AGCAAGGTGT GAAGAAAAAC 10680 

CAAGCTATCA AGGAAGTCGC TAAGATTTAC CAGTGGAATA AAAGTCAGCT CTACGCTGCC 10740 

TACCACGACT GGGAAGAAAA ACAATAAAGG GAGACAGGAT GTAATAATTC TGTCTGTTTC 10800 

TGTTTAACTT AATTAGTGAT GATAATATAA AGATGTATCA CTTGGTATAG AAGCTTTGGT 10 8 60 

ATTAAGTTTT TTATTAAGCC CATACGGAAT ACCGATGGTT GGAGCAGCAG TTATAGCGTT 1092 0 

CTTAGAAGGT ATAAATAGAA AAATAAGGTC ATTTTAAATC AAAGGATTGA TAAATCAGAA 10980 
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AGAAGGTGAT TTTTTGCGAA CATACGAAAA TAAAGAAGAA CTAAAAGCTG AGATAGAGAA 11040 

AACATTTGAG AAATATATTT TAGAATTTGA TAATATTCCA GAAAATTTAA AAGATAAGAG 11100 

AGCTGATGAA GTTGACAGAA CTCCAGCAGA AAACCTTGCT TATCAGGTTG GTTGGACCAA 11160 

CTTGGTTCTT AAATGGGAAG AAGATGAAAG AAAGGGGCTT CAAGTAAAAA CACCATCGGA 1122 0 

TAAATTTAAA TGGAATCAAC TTGGTGAATT ATATCAGTGG TTCACAGATA CCTACGCTCA 11280 

TTTATCTCTG CAAGAGTTGA AAGCAAAATT AAATGAAAAT ATTAATTCTA TCTCTGCAAT 11340 

GATTGATTCG TTGAGTGAGG AAGAATTATT TGAACCGCAT ATGAGAAAGT GGGCTGATGA 11400 

AGCGACTAAA ACAGCGACTT GGGAAGTGTA TAAGTTTATT CATGTAAATA CGGTTGCACC 1146 0 

TTTTGGAACT TTCAGAACTA AAATCAGAAA ATGGAAGAAG ATAGTATTAT AAATTATATT 1152 0 

TTTAACTTTA AAAAATTTCA TAAAAATGGT TACCAAAGGC GATAGAAGAA AAACTATCGT 1158 0 

CTTTTTCTTT GCAAATTTTT AAGAAGGGAG GTGATCTTGC ATGGACTTTG AATATTTTTA 11640 

TAACAGAGAA GCGGAAAGAT TTAACTTCTT AAAAGTACCG GAGATATTAG TTGATAGAGA 11700 

AGAATTTCGG GGCTTATCAG CAGAAGCAAT TATCCTTTAT TCCATACTTC TTAAACAGAC 117 60 

AGGAATGTCA TTTAAGAATA ACTGGATAGA CAAGGAAGGC AGAGTATTTA TCTATTTTAC 11820 

TGTCGAAGAA ATTATGAAAA GAAGAAATAT CTCAAAGCCA ACTGCCATAA AAACATTAGA 1188 0 

TGAGCTTGAT GTAAAAAAGG AATAGGACTG ATCGAAAGAG TAAGGCTTGG ACTTGGTAAG 11940 

CCGAACATCA TTTATGTTAA AGACTTTATG AGTATATTTC AGGTAAAAGA AAATGACTTA 12000 

CAGAAGTCAA AAAACTTAAC TTCAGAAGTA AAAGATTTTA ACCTCAGAAG TAAAGAAAAT 12060 

GAACTTCAAG AGGTTAAGAA CCTTGACTCT AACTATATAG AGAATAATAA GAGTAAGTAT 12120 

AGTAAGAGAG AATATAGTTT TGGTGAAAAC GGACTTGGAA CATTTCAAAA TGTGTTTTTA 12180 

GCTGCTGAAG ATATATCGGA TTTACAAATC ATAATGAACT CACAGCTTGA GAATTACATT 12240 

AGACTTCCTG CAAAACTAGA ATCCTAGTTC ATGATTGATA ATGCCAGCAA TCAAATTCAT 12300 

TCGTAATCCG AAGCGTTTAC GATGATTTCG ATAGATTGTT GAAAACATTT TAAACGTTTT 12360 

TACTTTGGCA AAGATGTTCT CAATCTTGCT TCTCTCCTTG GATAGCGCAT GGTTACAGGC 12420 

TTTATCTTCA GCTGTTAGCG GCTTGAGTTT GCTGGATTTA CGTGGAGTTT GTACTTGAGG 12480 

ATATATCTTC ATGAGCCCTT GATAACCACT GTCAGACAAG ATTTTACCAG CTTGTCCGAT 12540 

ATTTCTGCGA CTCATTTTGA ACAACTTCAT ATCACGACAA TAGTTCACAG CGATATCCAA 12600 

AGAAACAATT CTCCCTTGAC TTGTGACAAT CGCTTGAGCC TTCATAGCGT GAAATTTCTT 12660 

TTTACCAGAA TGATTCGCTA ATTCTTTTTT TAGGGCGATT GATTTTTACT TCCGTCGCAT 12720 

CAATCATTAC CGTGTCCTCA GAACTGAGAG GAGTTCTTGA AATCGTAACA CCACTTTGAA 12 780 
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CAAGAGTTAC TTCAACCCAT TGGCTCCGAC GGATTAAGTT GCTTTCGTGA ATACCAAAAT 12840 

CAGCCGCAAT TTGTTCATAA GTTCGATATT CTCGCACATA TTGAAGAGTG GCCATAAGAA 129 00 

GGTCTTCTAG GCTTAATTTA GGTTTTCGTC CACCTTTTGC GTGTTTAAGT TGATAAGCTG 129 60 

TTTTTAATAC AGCTAATATC TCTTCAAAAG TCGTGCGCTG AACACCAACA AGACGCTTAA 13020 

ATCGTGCATC AGTTAGTTGT TTACTTGCTT CATCATTCAT AGAACTACTA TACCATATTT 130 80 

TGTTTCGCAG GAAGTCTATT GGAAAGTAAG AAATATTGAA GCTGAGGCTA TTAGAAGAAA 13140 

TTGTGAGCGT GGTGCTATTT TTTCAGGTAA AATAAAATAT CACGAAGATT CACAGTTTAA 13200 

AGGAGATCAC TATGTTGAAT GTTATGCTGT TTTAGATAAT ACGGTTATAG CAAGAGATAG 13260 

AATAACAGTC CCTATCGATC CGTTATGTGG AAAAGATTTT ATAGAGTAGC ATATAATTGA 1332 0 

TTCTTAACTG GAATACTCAC TATCTCTTTA CATCAAGAAA ATGACTAAAC AGGGAAGTTT 133 80 

GCCTTCTTCC CTTTTTTTGT TATACTAGTA GAAGAAAAAA TTAGAAAGAT TTGTGGGTGT 1344 0 

CAAACAGCCC AGTGGGGTGT TTTAATATGG ACTTAGGTCC CACCCAAAGA GGTATTAGTG 135 00 

TCGTGTCTCA ATCTTATATC AATGTTATCG GTGCTGGTTT GGCAGGTTCT GAAGCAGCTT 13560 

ACCAAATCGC AGAGCGTGGT ATTCCAGTTA AACTATATGA AATGCGTGGT GTCAAGTCTA 13 62 0 

CACCCCAGCA TAAAACAGAC AATTTTGCTG AGTTGGTTTG TTCCAATTCT TTGCGTGGGG 136 80 

ATGCTTTGAC AAATGCAGTT GGTCTTCTCA AGGAAGAAAT GCGTCGCTTG GGTTCTGTTA 13740 

TCTTGGAATC TGCTGAGGCT ACACGTGTTC CTGCAGGTGG TGCCCTTGCA GTGGACCGTG 13800 

ATGGTTTCTC TCAAATGGTG ACCGAAAAAG TTGCCAACCA CCCCTTGATT GAAGTGGTTC 13860 

GTGATGAAAT TACAGAATTG CCGACAGATG TTATTACGGT TATCGCTACT GGTCCTTTGA 13920 

CAAGTGATGC CTTGGCTGAA AAGATTCATG CTCTTAATGA CGGTGCTGGT TTTTATTTCT 13980 

ACGATGCGGC AGCGCCTATT ATCGATGTCA ACACTATCGA TATGAGCAAG GTCTACCTCA 14040 

AATCACGTTA TGATAAGGGA GAAGCGGCCT ACCTCAATGC CCCTATGACC AAGCAAGAAT 14100 

TTATGGATTT CCATGAAGCT TTGGTCAATG CAGAAGAAGC ACCGCTTAGT TCTTTTGAAA 1416 0 

AAGAAAAGTA CTTTGAAGGA TGTATGCCTA TCGAAGTCAT GGCCAAACGT GGCATTAAAA 14220 

CTATGCTTTA TGGCCCTATG AAGCCAGTCG GTCTTGAGTA CCCAGACGAC TATACAGGAC 142 80 

CTCGTGATGG AGAATTTAAA ACACCTTATG CGGTTGTGCA ACTTCGTCAG GATAATGCAG 14340 

CTGGTAGCCT CTACAATATT GTTGGTTTCC AGACCCACCT CAAATGGGGA GAACAAAAGC 14400 

GTGTCTTCCA AATGATTCCG GGTCTTGAAA ATGCGGAGTT TGTCCGTTAT GGTGTGATGC 14460 

ATCGCAATTC TTACATGGAT TCACCAAATC TTCTTGAGCA GACTTACCGT TCTAAGAAAC 14520 
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AACCAAATCT CTTCTTTGCT GGTCAAATGA CGGGTGTGGA AGGCTATGTT GAGTCGGCGG 14580 

CTTCAGGCTT AGTTGCGGGA ATTAACGCAG CTCGTCTCTT CAAGGAAGAA AGCGAGGCTA 14640 

TTTTCCCCGA GACGACAGCG ATTGGAAGCT TAGCTCATTA CATTACCCAT GCCGACAGCA 14700 

AACATTTCCA ACCAATGAAT GTCAATTTTG GGATCATCAA GGAGTTGGAA GGCGAGCGTA 14760 

TCCGTGATAA GAAGGCTCGT TATGAAAAAA TTGCAGAGCG TGCCCTTGCC GACTTAGAGG 1482 0 

AATTTTTGAC TGTCTAATTT TTTTGAAAGA ATTGCTCATG ATACTATAAA AATCTTAGAA 14880 

ATTGTGATAA AATAGGTAGG AT G AAAGAAG GAGAGTGAAA ATGGCGAATC CCAAGTATAA 14940 

ACGTATTTTA ATCAAGTTAT CAGGTGAAGC CCTTGCCGGT GAACGTGGCG TAGGGATTGA 15000 

TATCCAAACA GTTCAAACAA TCGCAAAAGA GATTCAAGAA GTTCATAGCT TAGGTATCGA 15060 

AATTGCCCTT GTTATCGGTG GAGGAAATCT CTGGCGTGGA GAACCTGCAG CAGAAGCAGG 1512 0 

TATGGACCGT GTTCAGGCAG ATTACACAGG AATGCTTGGG ACTGTTATGA ATGCTCTTGT 15180 

GATGGCAGAT TCATTGCAAC AAGTTGGGGT TGATACGCGT GTACAAACAG CTATTGCCAT 15240 

GCAACAAGTG GCAGAGCCTT ATGTCCGTGG ACGTGCCCTT CGTCACCTTG AAAAAGGCCG 153 00 

TATCGTTATC TTTGGTGCTG GAATTGGTTC ACCTTACTTC TCGACAGATA CAACAGCGGC 15360 

CCTTCGTGCA GCTGAAATCG AAGCAGATGC CATCCTCATG GCTAAAAATG GTGTCGATGG 15420 

TGTTTACAAT GCCGATCCTA AGAAAGATAA GACAGCTGTT AAGTTTGAAG AATTGACCCA 15480 

CCGTGACGTT ATCAATAAAG GTCTTCGTAT CATGGACTCA ACAGCTTCAA CCCTCTCAAT 15540 

GGACAACGAC ATTGACTTGG TTGTATTCAA CATGAACCAA CCAGGCAACA TCAAACGTGT 15S00 

CGTATTTGGT GAAAATATCG GAACAACAGT TTCAAATAAT ATCGAAGAAA AGGAATAAGA 15 660 

AAGAATATGG CTAACGCAAT TATTGAAAAA GCTAAAGAGA GAATGACCCA GTCTCACCAA 15720 

TCACTTGCTC GTGAATTTGG TGGTATCCGT GCTGGTCGTG CCAATGCAAG CTTGCTTGAC 15780 

CGTGTACATG TAGAATACTA TGGAGTCGAA ACTCCTCTTA ACCAAATCGC TTCAATTACG 15 840 

ATTCCAGAAG CGCGTGTTTT GTTGGTAACA CCATTTGACA AGTCTTCATT GAAAGACATC 15 900 

GAACGTGCCT TGAACGCTTC TGATATTGGT ATCACACCGG CTAATGACGG TTCTGTGATT 15960 

CGCTTGGTTA TCCCAGCTCT TACAGAAGAA ACTCGTCGTG ACCTTGCTAA AGAAGTGAAG 16020 

AAGGTCGGCG AAAATGCTAA AGTGGCTGTC CGCAATATCC GTCGCGATGC TATGGACGAA 16080 

GCTAAGAAAC GAGAAAAAGC AAAAGAAATC ACTGAAGACG AATTGAAGAC TCTTGAAAAA 16140 

GACATTCAAA AAGTAACAGA CGATGCTGTT AAACACATCG ACGACATGAC TGCTAACAAA 162 00 

GAGAAAGAAC TTTTGGAAGT CTAAAAATAA ACAGAAAAAC TCAGTTGGCA TTGCTGGCTG 162 60 

AGTTTTATTC GAAAGAAGGA AATATGAATA CAAATCTTGC AAGTTTTATC GTTGGACTGA 163 20 
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TCATCGATGA AAACGACCGT TTTTACTTTG TGCAAAAGGA TGGTCAAACC TATGCTCTTG 163 80 

CTAAGGAAGA AGGCCAACAT ACAGTAGGGG ATACGGTCAA AGGTTTTGCA TACACGGATA 16440 

TGAAGCAAAA ACTCCGCCTG ACAACCTTAG AAGTGACTGC CACTCAGGAC CAATTTGGTT 16 500 

GGGGACGTGT CACAGAGGTT CGTAAGGACT TGGGTGTCTT TGTGGATACA GGCCTTCCTG 16560 

ACAAGGAAAT CGTTGTGTCA CTCGATATTC TCCCTGAGCT CAAGGAACTC TGGCCTAAGA 16 620 

AGGGCGACCA ACTCTACATC CGTCTTGAAG TGGATAAGAA AGACCGTATC TGGGGCCTCT 16 680 

TGGCTTATCA AGAAGACTTC CAACGTCTTG CTCGTCCTGC CTACAACAAC ATGCAGAACC 16740 

AAAACTGGCC AGCCATTGTT TACCGTCTCA AGCTGTCAGG AACTTTTGTT TACCTACCAG 16800 

AAAATAATAT GCTTGGTTTT ATTCATCCTA GCGAGCGTTA CGCAGAGCCA CGTTTGGGGC 16 860 

AAGTATTAGA TGCGCGCGTT ATTGGTTTCC GTGAAGTGGA CCGCACTCTG AACCTCTCCC 16920 

TCAAACCACG CTCCTTTGAA ATGTTGGAAA ACGATGCTCA GATGATTTTG ACTTATTTGG 16980 

AAAGCAATGG CGGTTTCATG ACCTTAAATG ACAAGTCATC TCCAGACGAC ATCAAGGCAA 17 040 

CCTTTGGCAT TTCTAAAGGT CAGTTCAAGA AAGCTTTAGG TGGTCTTATG AAGGCTGGTA 17100 

AAATCAAGCA GGACCAGTTT GGGACAGAGT TGATTTAGGG AGGCTTATGA GAAAATCATT 17160 

TTACACTTGG CTCATGACCG AGCGCAATCC TAAAAGTAAC AGTCCCAAAG CAATTTTGGC 17220 

AGACCTCGCT TTTGAAGAGT CAGCCTTTCC AAAACACACA GATGATTTTG ATGAGGTCAG 172 80 

TCGCTTTTTG GAGGAGCATG CCAGTTTCTC TTTTAACCTA GGAGATTTTG ACAGCATTTG 17340 

GCAGGAATAT CTAGAACACT AGCATTTATT CATTGGGTTT GGGCTAGTAA TTTCTCCATC 17400 

CCTCTGCTAT AATAAAAAGA AATAAAAGGA TTAGAGAGGT TCTTTATTTG AAGGAACATT 174 60 

CAATAGACAT TCAACTGAGT CATCCAGATG ACCTGTTTCA TCTTTTTGGT TCCAATGAAC 17 52 0 

GCCATCTTCG TTTGATGGAA GAAGAGCTTG ATGTTGTGAT TCATGCTCGT ACGGAGATTG 17580 

TCCAGGTTTT GGGAGAAGAG TCTGCCTGTG AGGAAGCCCG TCAAGTTATT CAGGCTTTGA 17 640 

TGGTCTTGGT AAATCGTGGG ATGACCGTTG GTACGCCAGA TGTAGTCACT GCGATTAGCA 177 00 

TGGTCAAAAA TGATGAAATT GACAAGTTTG TCGCCCTTTA CGAAGAAGAA ATTATCAAGG 177 60 

ATAATACTGG GAAACCTATC CGTGTCAAAA CCCTAGGGCA AAAGCTTTAT GTGGACAGTG 17 82 0 

TCAAACAGCA TGATGTGACC TTTGGAATTG GGCCAGCAGG TACAGGGAAG ACCTTCCTTG 17 880 

CAGTGACCTT GGCAGTGACT GCCCTTAAAC GTGGGCAAGT CAAGCGAATT ATCCTAACTC 17 94 0 

GTCCAGCGGT GGAAGCGGGA GAGAGTCTTG GATTTCTTCC GGGTGATCTT AAGGAGAAGG 180 00 

TGGATCCTTA CCTTCGTCCT GTTTACGATG CCTTGTATCA AATTCTTGGG AAAGACCAAA 18060 
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CGACTCGTCT CATGGAGCGT GAAATTATCG AAATTGCGCC CCTTGCCTAT ATGCGTGGCC 1812 0 

GGACCTTGGA TGATGCCTTT GTCATTCTCG ATGAGGCGCA AAACACGACC ATCATGCAGA 18180 

TGAAGATGTT CTTGACGCGT TTAGGTTTTC ATTCTAAGAT GATTGTCAAT GGAGATATTA 18240 

GTCAGATTGA CCTGCCACGT AATGTCAAGT CCGGTTTGAT TGATGCTCAA GAGAAACTCA 18300 

AG AAC AT C C A TCAGATTGAC TTTGTTCATT TTTCAGCCAA GGATGTGGTT CGCCATCCTG 18360 

TTGTCGCTCA GATTATCCGA GCCTATGAAT ATTCTACTGA AGTTGCACAC GACTGATTTT 18420 

GAGGAAGTTC GCCTGCAAAA GAATAGACTT GTTCGGTAAC TGTAAAAAGT GTTATACTAT 18480 

TTTTATGGAA ACAGTATACG ACAAAGCACA AAAACTTAAC TCAAAAAACT TCAAACTATT 18540 

GATTGGTGTC AAAAAGGAAA CCTTTCAACT CATGCTAGAA CACCTGAATT CAGCCTATCA 18 600 

GATTCAGCAC CGAAAAGGTG GACGTCCACG TAGTCTGCCC ATGGAAGACC AGCTCATTAT 18 660 

GACCCTCCGT TACTTGCGAT ATTATCCCAC TCAGCGTCTG CTGGCCTTTG ATTTTGGCGT 18720 

CGGTGTAGCT ACGGTAAATG CCATCATCAC TTGGGTGGAG GATACACTTC GTGCGTCAGG 18780 

TAGCTTTGAT TTGGACCATT TAGAAGCCCC GAGTGCTGCT GTGGCTATTG ACGTGACCGA 18840 

AAGTCCGATT CAGCGTCCAA ACAAAACCAA AGCAAAAATT ATTCTGGTAA AAAGAAACGA 18900 

CACACCTTAA AAACTCAAAT TATGCTGGAT TTGACGACAC ATAAAGTCTG TCAAATGGCC 18960 

TTTTCTGACG GACATACGCA TGATTTTACT CTCTTCAAAG AAAGTATTGG ACAAAGTTTG 19020 

CCTGAAACGA CGCTTGCCTT TGTTGACCTA GGTTATTTAG GCATCTTGAA ATTTCATGAG 19080 

AATACTTTCA TTCCTGCTAA AAATTCCAAA AATCGCCGCC TGAGTGAGGA TGATAAGCAG 19140 

TTAAATAAAG AGATGTCAGC GATACGAATT GAAATTGAAC ATTTTAACGC TAAATTCAAG 19200 

ACCTTCCAAA TCATGTCAGT CCCTTATCGT AACCGCAGAA AACGTTTCGA GTTACGGGCG 192 60 

GAATTAATTT GTGCCATCAT CAATTATGAA GTGAACTAGA TTCCGAACAA GTCTAATATA 19320 

CTTTTGAGAG AGGAAAATCC AGTTGTATAG GCTAAAGGTT TTATCCAAAG GTCTGAGACA 193 80 

ACGATTAGGC ACGATGGAAA GAACTTTTAT GTGGCTGATG ACGATCAGTG CATCTTCCTG 194 40 

TGTCATAATC ACAGGGCACA AGAAAGTAGG AATTTGAAAA GATGATTGAC CAACTATCTA 19 5 00 

AGTATTACAG TTGTAGGATA CTAACTGAAA AGGATATTCC AAGTATTTTA TCTTTATATG 19560 

AAAGTAATCC TCTGTATTTT CAGCATTGTC CACCAGAGCC AAATTTTGCA ACTGTAAAAG 19 62 0 

AGGACATGCT TTGTCTACCT GAAGGTAAAG CTAAGGCTGA TAAGTTTTTT GTTGGATTTT 19 680 

GGAATGGATC TGACCTTGTG GCTGTTATGG ATTTTGTCTA TGCATATCCT GATGAGGAGA 19 7 40 

CTGTTTTTAT TGGTTTGTTT ATGGTTGATC AAGCCTATCA GAGAAAAGGG ATTGGTAGTC 19800 

ATATTGTGAC AGAAGCACTA GCTTATTTTG CTAAGAACTT TCGAAAGGCA CGTTTGGCTT 19860 
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ATGTTAAGGG AAATCCGCAA TCTCAGCATT TTTGGGAAAA GCAGGGCTTT AAATCAATTG 1992 0 

GATGCGAGGT TAAGCAAGAA CTCTATACGG TTGTTATCGC TGAACAGAGC CTAGAAGATT 19980 

AGAAATGGCA TCAAGTAAGA ACTATTTGGA ATTTGTTTTG GAACAATTAT CAGGATTAGA 20040 

TGATGTGACT TACCGTTCCA TGATGGGGGA GTATATTCTT TACTTCCGCG GCAAGATTAT 2 0100 

TGGCGGCATT TATGACGATC GCTTTTTAGT TAAACCCGTG CAAGCAGTCT TAGATAAGAT 2 016 0 

TGACCAATCT TCTTTTGAGT TTCCATACAA AGGTGCCAAA GAAATGATTT GAGTGGAAGA 2 0220 

ACTTGATAAT AAGATGTTTC TATAAGACCT AATTTTAGCT ATGTATAACC AACTGCCAAC 2 0280 

GCCCAAACCT AAAAAGAAAA AGCAAGGGTG AACGAAGTAA AAAAGAAGTC TGCTAAGGCC 2 0340 

CTGTCTTTGC ACGGGTAAAA T T T TAT AT AT AAAAAGAAGC TGGGACTAAA GAGCTCAGCT 2 0400 

TCCTTTGGTT TATATAATTG TCATTACAAG ACGAAGTGGT TGGGCGAAAC TCTGTTGACT 2 0460 

TTATTCAATT TAGAGTTTCT TATGCACAAT TGAGTCTGGA ACGAAAGTCT CCAGTTGCAA 2 052 0 

AGTATACAGT ACAATAAACC AACGATGTAA TAGCTGATGA CACAAAGCAC AGTGGGTAGG 2 058 0 

ACTTGCGAAG TCACCCTTTT CTTTTCAAAA TTTATACTAA ATCATTGATA TCAGTGTAGT 20640 

CACGATTAAG TCCTTGAGCA ACTGGTAGGT TAGTCAAGTA ACCTTGATAA GTAGTCACAC 2 0700 

CTTGACGCAA GCCTTCATCT TCAGAGATTG CTTGTGCGAA TCCTTTGCCA GCCAAAGCTT 2 0760 

CGATATAAGG AAGAGTGACA TTGGTTAGGG CGATGGTTGA AGTGCGAGCA ACCGCACCAG 2 0820 

GGATATTGGC AACGGCATAG TGGAGAACAC CGTGTTTTTC ATAGACGGGT TCATCGTGCG 2 0 880 

TTGTCACACG GTCAGCTGTT TCGATAACGC CACCTTGGTC AACAGCAACG TCAACGATAC 20940 

AGAGCCTGGA CGCATTTGTT TGACCATCTC ATCTGTCACC AATTCCGGTG CTTTTGCACC 21000 

AGGGATGAGA ATGGCTCCAA TCACCACATC AGCATCTCTC ACACTTGCTT CAATGTTGAA 21060 

TGAATTAGAC ATAAGAGTTT GAATTTGACT TCCAAAGACT TCTTCTAGAA CTGAGAGACG 21120 

CTTGGAACTA ATATCTAAAA TAGTCACTTG AGCACCAAGA CCAAGGGCGA TGCGGGCAGC 21180 

ATGTGTACCG ACGACACCAC CACCGATGAT AGTTACTTTT CCTTTTGGAA CACCTGGTAC 21240 

ACCACCAAGT AG AAC AC C AG AGCCACCAGC TTGCTTAGTA AGGAAGTGAG CTCCGATTTG 21300 

AACAGCCATA CGACCTGCAA CCTCACTCAT AGGAACGAGG AGCGGTAGTT GTCCTTGATT 213 60 

GTCACGAACA GTTTCAGTTG TTTTTGCTGT TAACATAGCA TCTGCTAATT CTGGAGCAGC 21420 

GGCCATGTGC AAGTAGGTGA AGAGAAGAAG ATCGTCGCGC AAGTAACCGT ATTCAGAACT 21480 

TAAAGATTCT TTTACTTTCA CAACCAACTC TGCTGCCCAA GCTTCACCAG CAGTAGCGAC 21540 

AATCTCAGCT CCTTGCTTTT GATAGTCAGC ATCAGTAAAG CCAGAACCGA GACCAGCATT 21600 
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TGTTTCGATA AGGACACGAT GACCACGACT AACTAAGCTA TGAACACCTG CAGGTGTGAG 21660 

GGCGACACGG TTTTCGTTAT TTTTAATTTC TTTTGGGATT CCGATTAACA TTGAGATAAC 2172 0 

CTACCTTTCA ATTGACGGTC TTGTTTTGGT TGTCACATTC CAGTTCATAA ATCAAAAATG 2178 0 

TGACGGTTTC ATTGTATATG AAACCGCTTC AAAAATCAAG AAAAACTTGT CATCCAAATT 21840 

TTTTTATGCT AGACTAGTGA AAATCAAGCT CTAATGGAGG GAAAAGTATG GAATCAATAT 21900 

TTGTGAAATT TGCCCAGTAT CCGTCTATAG AAACGGAGCG TTTATTGCTC AGACCTGTAA 21960 

CTTTGGATGA TGCGGAAcAA TGTTTGACTA TGCCTCGGAC AAGGGTAATA CACGTTACAC 22 020 

TTTTCCAACC AATCAAAGCT TGGAAGAAAC CAAGAATAAC ATTGCTCAGT TCTACTTGGC 22 080 

TAATCCCTTG GGACGTTGGG GAATAGAACT AAAAAGCAAT GGTCAGTTTA TTGGAACCAT 2 2140 

TGACTTGCAC AAGATTGATT CTGTTCTTAA GAAGGCAGCT ATTGGCTACA TTATCAATAA 22 200 

AAAGTATTGG AATCAAGGAT TAACGACAGA AGCCAATCGT GCTGTGATTG AGCTAGCTTT 222 60 

TGAGAAGATA GGGATGAATA AGTTGACTGC CCTTCACGAT AAGGCTAATC CCGCGTCAGG 2 2 320 

AAAGGTCATG GAGAAATCAG GCATGCGTTT TTCCCATGCA GAACCATATG CTTGTATGGA 22380 

CCAGCATGAA AAAGGCCGAA TCGTGACAAG AGTTCATTAT GTCTTGACCA AGGAAGACTA 22440 

TTTTGCAAAT AAATAAGCAG TTGAAAAGAA ATTTTTCGAC TGTTTTTTCT TCCTCTTACG 22500 

AATAATCTAA GAGAGGAGAA AATATGGAAG CAATTATCGA GAAAATCAAA GAGTATAAAA 22 560 

TCATCGTCAT CTGTACTGGT CTGGGCTTGC TTGTAGGAGG ATTTTTCCTG CTAAAACCAG 22 620 

CTCCACAAAC ACCTGTCAAA GAGACGAATT TGCAGGCTGA AGTTGCAGCT GTTTCCAAGG 2 2 680 

ACTCATCGAC CGAAAAGGAA GTGAAGAAGG AAGAAAAGGA AGAACCCCTT GAACAAGATC 22 7 40 

TAATCACAGT AGATGTCAAA GGTGCTGTCA AATCGCCAGG GATTTATGAC TTGCCTGTAG 22 800 

GTAGTCGAGT CAATGATGCT GTTCAGAAGG CTGGTGGCTT GACAGAGCAA GCAGACAGCA 22 860 

AGTCGCTCAA TCTAGCTCAG AAAGTTAGTG ATGAGGCTCT GGTTTACGTT CCTACTAAGG 2 2 920 

GAGAAGAAGC AGTTAGTCAA CAGACTGGTT CGGGGACAGC TTCTTCAACA AGCAAGGAAA 22980 

AGAAGGTCAA TCTCAACAAG GCCAGTCTGG AAGAACTCAA GCAGGTCAAG GGACTGGGAG 2 3040 

GAAAACGAGC TCAGGACATT ATTGACCATC GTGAGGCAAA TGGCAAGTTC AAGTCAGTAG 2 3100 

ACGAGCTCAA GAAGGTCTCT GGCATTGGTG GCAAAACAAT AGAAAAGCTT AAAGACTATG 23160 

TTACAGTGGA TTAAGAATTT CTCTATTCCC CTAATTTACC TGAGTTTTCT ATTACTTTGG 2 3220 

CTTTATTACG CTATTTTCTC AGCATCTTAT CTTGCTTTGT TGGGCTTTGT TTTTCTGCTA 2 32 80 

GTCTGTCTCT TTATCCAATT TCCGTGGAAA TCTGCTGGTA AAGTTCTAAT AATTTGCGGA 23 340 

ATCTTTGGAT TTTGGTTTGT TTTTCAAAAT TGGCAACAGA GTCAAGCGAG TCAAAATCTG 23 4 00 
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GCGGATTCTG TTGAAAGGGT ACGGATTTTG CCTGATACTA TTAAGGTTAA TGGTGATAGT 23460 

CTATCCTTTC GTGGCAAGTC TAACGGTCGT GCTTTCCAAG TCTATTATAA ACTCCAGTCC 2 3 520 

GAGGAGGAGA AAGAAGCCTT TCAAGCTTTA ACTGACCTGC ATGAGATAGG ACTAGAAGGG 2 3580 

AAGCTTTCGG AGCCAGAAGG GCAGAGAAAT TTTGGTGGCT TTAATTACCA AGCCTATCTG 2 3 640 

AAGACTCAGG GAATTTACCA GACTCTCAAT ATCAAAACAA TCCAGTCACT TCAAAAGATT 2 3 700 

GGCAGTTGGG ATATAGGAGA AAACTTGTCC AGTTTACGTC GAAAGGCTGT GGTTTGGATT 2 3760 

AAGACGCACT TTCCAGACCC TATGGGCAA? TACATGACAG GACTCTTGCT GGGACATCTG 23820 

GACACCGACT TTGAGGAGAT GAATGAGCTT TATTCCAGTC TAGGAATTAT CCACCTCTTT 23 880 

GCCCTATCTG GCATGCAGGT AGGTTTTTTC ATGAATGGAT TTAAGAAACT TCTCTTGCGA 23940 

TTGGGCTTGA CCCAAGAAAA GTTGAAATGG CTGACTTATC CCTTTTCCCT TATCTATGCG 24000 

GGACTAACTG GATTTTCAGC ATCGGTTATT CGCAGTCTCT TGCAAAAGCT ACTGGCTCAA 2 4060 

CATGGGGTTA AGGGCTTGGA TAATTTTGCC TTGACGGTGC TTGTCCTCTT TATTGTCATG 24120 

CCAAACTTTT TCTTGACAGC AGGAGGAGTC TTGTCCTGCG CTTATGCTTT TATCCTGACC 24180 

ATGACCAGCA AAGAAGGGGA GGGGCTCAAG GCTGTTACTA GTGAAAGTCT AGTCATCTCC 24240 

TTGGGCATAT TGCCCATTCT ATCCTTCTAT TTTGCGGAAT TTCAACCTTG GTCTATCCTT 24300 

TTGACCTTTG TCTTTTCCTT TCTTTTTGAC TTGGTCTTCT TACCGCTCTT GTCTATCTTA 243 60 

TTTGTCCTTT CCTTTCTCTA TCCAGTCATT CAGCTGAACT TTATCTTTGA ATGGTTAGAG 2 44 20 

GGCATTATTC GCTTGGTCTC GCAGGTGGCA AGGAGACCAC TTGTCTTTGG TCAACCCAAC 24480 

GCATGGCTTT TAATCTTATT GTTAATTTCC TTGGCTTTGG TCTATGATTT GAGGAAAAAC 2 4540 

ATTAAAGGAT TAACAGTATT GAGTTTATTG ATTACAGGTC TCTTTTTCCT TACCAAGTAT 24600 

CCACTGGAAA ATGAAATCAC CATGCTGGAT GTGGGGCAAG GAGAAAGTAT TTTCTACGGG 24 660 

ATGTAACTGG GAAAACCATT CTCATAGATG TAGGTGGTAA GGCAGAATCT TATAAGAAAA 247 20 

TCAAAAAATG GCAAGAAAAG ATGACGACCA GCAATGCCCA GCGAACCTTG ATTCCCTATC 247 80 

TCAAAAGTCG AGGAGTAGCT AAGATTGACC AGCTAATTTT GACTAACACG GACAAGGAGC 24 840 

ATGTTGGAGA TTTGTCAGAG ATGACCAAGG CTTTCCATGT AGGGGAGATT CTAGTATCAA 24 900 

AAGACAGTCT GAAACAGAAG GAATTTGTGG CAGAACTACA GGCGACTCAA ACAAAGGTGC 24960 

GTAGTATGAT AGTAGGGGAG AACTTGCCCA TTTTTGGAAG TCAGTTAGAA GTTCTATCTC 2502 0 

CAAGGAAAAT GGGAGATGGA GGACACGATG ATACCCTAGT TCTGTATGGG AAATTCTTGG 25080 

ATAAGCAATT TCTCTTCACG GGAAATTTGG AGGAGAAAGG AGAGAAGGAC TTGCTGAAGC 2514 0 
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ACTATCCAGA CTTGAAAGTA AATGTTTTGA AAGCTAGCCA ACATGGCAAT AAAAAATCAT 25200 

CAAGTCCAGC CTTTCTAGAA AAACTCAAAC CAGAGCTTAC TCTTATCTCA GTTGGAAAGA 25260 

GCAATCGAAT GAAACTCCCC CAT C AGG AAA CATTGACACG ACTGGAAGGT ATCAATAGCA 2532 0 

AAGTTTATCG AACTGACCAG CAAGGAGCTA TACGTTTTAA GGGGTTGGAT AGTTGGAAAA 253 80 

TCGAAAGTGT TCGATAGGAA GGATAAATGT TGTAGATTAG TGAAATAAAC TAAAAATTTG 2 5440 

TTGCATAATA ATGATAAAAA TGGTATAATG AAAACGTATT CAATATTGAG GATATAAAAT 2 5500 

CATTAAAAAT CAGCAAAAGT TGTTTTATTA GTTAGTTTAT AATCTATTGG TCTTCTTCAG 25560 

TCCAGTGTAT CTGCTGTGAC AGTCACTAAA AGTTACAAGT ATGATTGGAA TACGGTTTGG 2 5620 

GAATATAGTA CCAACTATCA CGACCATCAG TATGCTTGGA TTCCGTCATG GTCTCGTTAT 2 5680 

GACAGCTATT CTGAGTATAA AGTTGGCGGA GGCTGGAACT ACGCTCGTTA TGAGGTCATA 2 5740 

AACTATTACA GCGGAGGCTA TTAATTCTTA AAGAGTGAGA AAAAGGAGGG CTAGATATGT 2 5800 

TGCAGCTTAC TCATGTGACC TTAAAAACGC GACAAGTCAT CTTGCAAGAT GTGGATTTCA 2 5860 

CCTTTAAAAA GGGTAGGGTT TATGGTCTTC TTGCTATCAA TGGCTCTGGA AAGACGACCC 2 5920 

TGTTCCGTGC CATTAGCAAT TTAATTCCCA TAAGTAGTGG AAATATCGCA GCCCCTCCTT 2 598 0 

CTTTATTTTA TTATGAGAGT ATTGAATGGC TGGATGGAAA CTTAAGTGGG ATGGACTACC 2 604 0 

TTCGTCTTAT CAAAAACATC TGGAAGTCAG GTCTGAACTT GAGGGATGAA ATCGCCTATT 2 510 0 

GGGAAATGTC TGACTATATC AGTCTTCCCA TTCGCAAGTA TTCCTTAGGC ATGAAGCAAC 2 6160 

GCTTGGTGAT TGCCATGTAT TTCCTCAGTC AGGCCAAATG CTGGCTCATG GATGAGATTA 2 6220 

CAAATGGCTT AGATGAGTAT TATCGACAGA AGTTTTTTGA TAGGCTAGCA CAAATCGATA 2 6280 

GACAAGAACA GCTGGTTCTT TTAAGTTCCC ACTATAAGGA AGAGTTGGTT GATGTCTGCG 2 534 0 

ATAGAGTAGT AACCATTCAT CAGGGGCAGA TAGAAGAGGT TTAGTTTATG AAAGATGTTA 26400 

GTCTATTTTT ATTGAAAAAA GTTTTCAAAA GCCGCTTAAA CTGGATTGTC TTAGCTTTAT 2 646 0 

TTGTATCTGT ACTCGGTGTT ACCTTTTATT TAAATAGTCA GACTGCAAAC TCACACAGCT 2 6 520 

TGGAGAGCAG GTTGGAAAGT CGCATTGCAG CCAACGAGAG GGCTATCAAT GAAAATGAAG 2 6580 

AGAAACTCTC CCAAATGTCT GATACCAGCT CGGAGGAATA CCAGTTTGCT AAAAATAATT 2 6 640 

TAGACGTGCA AAAAAATCTT TTGACGCGAA AGACAGAAAT TCTGACTTTA TTAAAAGAAG 2 6700 

GGCGCTGGAA AGAAGCCTAC TATTTGCAGT GGCAAGATGA AGAGAAGAAT TATGAATTTG 2 67 60 

TATCAAATGA CCCGACTGCT AGCCCTGGCT TAAAAATGGG GGTTGACCGC GAACGGAAGA 2 6820 

TTTACCAAGC CCTGTATCCC TTGAACATAA AAGCACATAC TTTGGAGTTT CCGACCCACG 2 6880 

GGATTGATCA GATTGTCTGG ATTTTAGAGG TTATCATCCC AAGTTTGTTT GTGGTTGCTA 2 6940 
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TTATTTTTAT GCTAACACAA CTATTTGCAG AAAGATATCA AAATCATCTG GACACAGCTC 2 7 000 

ACTTATATCC TGTTTCAAAA GTGACATTTG CAATATCCTC TCTTGGAGTT GGAGTGGGAT 2 7060 

ATGTAACTGT GCTGTTTATC GGAATCTGTG GCTTTTCTTT TCTAGTGGGA AGTCTGATAA 2 7120 

GTGGTTTTGG ACAGTTAGAT TATCCCTACC CAATTTATAG CTTAGTGAAT CAAGAAGTAA 27180 

CTATTGGGAA AATACAAGAT GTATTATTTC CTGGCTTGCT CTTAGCTTTC TTAGCCTTTA 27240 

TCGTCATTGT GGAAGTTGTG TACTTGATTG CTTACTTTTT CAAGCAAAAA ATGCCTGTCC 27300 

TCTTTCTTTC ACTCATTGGG ATTGTTGGCT TATTGTTTGG TATCCAAACC ATTCAGCCTC 27 3 60 

TTCAAAGGAT TGCACATCTG ATTCCCTTTA CTTACTTGCG TTCAGTGGAG ATTTTATCTG 2742 0 

GAAGATTACC TAAGCAGATT GATAATGTCG ATCTAAATTG GAGCATGGGA ATGGTCTTAC 27480 

TTCCTTGCCT GATTATCTTT TTGCTATTGG GAATTCTATT TATTGAAAGA TGGGGAAGTT 27 540 

CACAGAAAAA AGAATTTTTT AATAGATTCT AGCTTTCCTA TAGGTAGGGA AAATAAGTAA 27 600 

AAACTAACAT AGAGAGGGAA TCAACTTGAT TCTCTCTTTT TGATTCGAAA ACCAAACCAA 27 6 60 

AATACAAACA CAAACTTTTC AAAAAATAAC TTTTTATCTT GACAAGAGCT AGAAAACTTG 2772 0 

GTATCATATA AAAGTTGAGA AAAGCAGAAG TGAGAGCTTC TCGCCTTGTG ACATTAAGTT 27780 

GCCTGGCCCT ACGGATGAAA AGTTTCGAAG AAACGCTATC ATAACGTGCG GGCTTGTATA 27840 

TTTACAAGTC CGCTATTGTT TTTCTCTAAT AAAACAAAAG AGGTGAAAAC CATAGCAAAG 27900 

CAAGACTTAT TCATCAATGA TGAGATTCGT GTACGTGAAG TTCGCTTGAT TGGTCTTGAA 279 60 

GGAGAACAGC TAGGTATCAA GCCACTCAGT GAAGCGCAAG CTTTGGCTGA TAACGCTAAT 28020 

GTTGACCTAG TATTGATTCA ACCCCAAGCC AAACCGCCTG TTGCAAAAAT TATGGACTAC 28080 

GGTAAGTTCA AATTTGAGTA CCAGAAGAAG CAAAAAGAAC AACGTAAAAA ACAAAGCGTT 2814 0 

GTTACTGTGA AAGAAGTTCG TCTAAGTCCG G 2 8171 
(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 7147 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS : double 
CD) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 

CCGCTCAACT TTTGCAATCA AGGCTAAGTA GACAGCAGCA AATTTCATAT TGTATAATTT 6 0 

CTGACTCATA CTTCTCTCTT TCTATGTGTA CTAGTATAAA TAAGAAAAAG AAGGCCGTCA 12 0 
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AGCCTTCTTT TGATTTATTC TTCTGCTTCA TCTTCTGTAA ATTGACTATT GTACAAGTCA 180 

GCGTAGAAGC CACCTTGCGC CATCAGTTCC TCATAGTTGC CTTGCTCGAT GATATTTCCA 240 

TCTTTCATGA CCAAGATCAA GTCTGCATTT CGGATGGTTG ACAAGCGGTG GGCAATGACA 3 00 

AAGGATGTGC GTCCTTCCAT CAAACGGTCC ATGGCTTTTT GGATCAATTC CTCTGTCCGT 3 60 

GTGTCAACAG AAGAAGTCGC CTCATCCAAA ATCAAAAGCG GTGCATCCTT AAGAAGGGCA 42 0 

CGAGCAATAG TCAATAGTTG TTTTTGTCTT ACAGACAAGG TCACGGTGTC ATCCAAGATG 480 

GTATCATAGC CATCTGGCAA GGTCATAATA AAGTGGTGAA TTCCCACAGC CTTACTAGCT 54 0 

TCCATCATTC GTTCATCACT AATCCCTATT TGATTATAGA TGAGATTGTC TCGAATAGTT 6 00 

CCTTCAAAGA GCCAGGTATC CTGCAAGACC ATTGAAAAGG CATCATGCAC TTCTGAACGC 6 60 

GTCATAGCCT TGGTATCCAC ACCATCAATG CGAATACTTC CCTTATCAAT CTCATAGAAT 720 

TTCATCAAAA GATTGACAAT GGTTGTCTTA CCAGCCCCAG TCGGCCCAAC AATGGCAACC 780 

TTTTGACCAG CATGAGCTGT CGCAGAGAAG TCATAGTCTT GAACATTGAC ACCGTCCACC 84 0 

AGAATTTCTC CTGCTGACAC GTCGTAGAAA CGTGGAATCA GATTGACCAG AGTTGATTTA 900 

CCAGAACCTG TTGACCCAAT AAAGGCCACT GTTTCACCAC TTTCTGCTTT AAAGCTAACA 9 60 

TGTTCAATAA CTGCCTCCGA ATTTGCCGCA TAGCGgAAGG TCACATCCTT AAACTCGACC 102 0 

TGACCTTTGA AGTTTTCATC AGTCAGCTGC ACTTGAACAG GGTTTTGGAT AGAAGAATGC 1080 

AAATCTAAAA CTTGATTAAT CCGCTTAGCA GAGACCATAG TTCGGGGAAG AACGATGAAG 1140 

AGTGCTCCCA TGAGAAGGAA GCCCATGACA ACCTACATGG CATAAGACAT GAAAACAATC 12 00 

ATGTCACTAA AGAGAGGCAG ACGCGCTATC GGAGCAGCGT CGTTAATCAC ATAGGCCCCA 12 60 

ATCCAGTAAA TCGCCACACT CAAACCACTT GAAATCCCCA TCATGATAGG ATTCAAAATA 132 0 

GCCATAAGAC GGTTGACAAA CAAATTCAAA CGGGTCAATT CATCATTTAC TGCTGCAAAT 1380 

TTTTCATTTT GATAATCCTC TGCATTGTAG GCACGAACGA CACGAATACC TGTTAAACTC 1440 

TCACGAGTGA TACTGTTCAG TTTATCTGTC AGCCCCTGAA TCAAGGACTG TTTTGGAAAG 1500 

GCTAGCGTCA TCAAAACGGT CGTCATCAGG ACGTTGATAA TCACTGCCAC AAGTACGGCC 1560 

CAGAGCCAGT ATTCTGAATG ACCTAAAATC TTCCCAATAG CCCAGATAGC CATAATTGAA 162 0 

CCACGCGTTA CCACTTGCAA GCCCATAGTA ATCAACATTT GAACTTGAGT AATGTCATTG 1680 

GTAGTACGCG TCAAGAGGCT AGGAATTGAA AATTTCTTAA TCTCTGTCTG CGAGTAATCC 1740 

AAAACTCGGT TAAAAATATC ACTTCTCAGC CTACTAGTAT AAGAAGCCGC CACTCGGGAT 1800 

GCAAAAAATC CAACTGCAAC TACGGACAAG AAGGCAAGAA AGGACATTCC CATCATCATG 1860 

CTTGCCGACT GCCACAACTC ATCTAAATTA GTTTCTTGAC TACCTAGCAA ATCCGTAATT 192 0 
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TTCGAGATAT AGGTCGGCAC TTCCAACTCT AGATAGACCG AAAAGCAAGT AAAG AG AAT G 1980 

GCTAGTAAAA TCATCCCCCA TTCTTTTCTA CTAATTCTTT TGGCTAATTT CTTTATTCTC 204 0 

TCCTCCTATT CCCTTGATAT TTTGCCTGTA GTTGACCGAG AACCTTCTCA AAAATCAGTA 2100 

ATTCATCTTC ATCAATGTCT TCCATCAACT GCTTGTCTAT GCGTTCAAAA AAAGCCTTAA 2160 

CCTGTTGCAT CTGAGAACGT GCTTTGTCCG TCAGACGAAC AAACTTAGCC CGCTTATCAA 222 0 

CAGGACTCGC CTCCAATTCC ACCAAACCAT TTTGCACTAT ACGCTTAACC AGATTACTAG 228 0 

CAACAGGCTT GGTAATATTG AGTTCCTGCT CGATATCTTT AATCAAGACC AAGTCTTGGT 234 0 

TTTTCTCGCG ATTATCCAAA AAACGCACAA CCTGACCTTG CGGCCCACCC ATAAATTCAA 2400 

TGCCGCAACG TTTGGCTTCC TTTTGCACCA TCAGGTGAAT TTGATGACCA AAACGCTTAA 2460 

AGACTAACAT CGGTTTATCC ATAATCTCCC CCTTCTAAAT AAAAATAGTT CTCTGGAGAA 252 0 

TAATTAAATT TCTATGAGAA CTATTTTCTT GATTAAAAAA ATCCCAAGTG ATTTTCTCAC 2 580 

TTAGGATCAT GTTCTATAGG TTAAATTAAA ACCCATCTAC GTTCGTATAA ATCTTTTGGA 2 64 0 

CGTCTTCGTC GTCTTCAAGA ACGCTGTAAA GTTTTTCAAA GGTTTCAAGG TCTTCGCCTG 2700 

ACAATTCCAC TTCTGACTGA GGAATCATTT CCAATTCAGT CACTTGGAAT TCTTCAATAC 27 60 

CAGACTCACG GAGGGCAACG ATAGCCTTGT GAAGGTCAGT TGGCGCTGTG TAAACTGTGA 282 0 

TTGTACCTTC TTGTGCTTCT ACGTCATCCA CATCCACATC CGCTTCGAGC AATTGCTCAA 2880 

AGACTGCGTC CGCATCTTCA CCTCCAAATA CAATAACACC TTTGTTGTCA AAGAGGTAAG 2940 

AAACAGAACC TGAAGCGCCC ATGTTTCCGC CGTTTTTACC AAAGGCTGCA CGGACATTGG 3000 

CTGCTGTACG GTTGACGTTA GAAGTCAAAG TATCCACAAT TAGCATAGAG CCATTTGGCC 3060 

CAAAACCTTC GTAACGTCCT TCTGTAAAGG TTTCGTCTGT GTTTCCTTTG GCTTTATCAA 312 0 

TCGCTTTATC GATAATGTGT TTTGGCACTT GGGCTTGTTT AGCACGGTCG ATAACGAATT 3180 

TCAAAGCTGA GTTTGATTCT GGATCTGGAT CACCTTTTTT AGCTGCTACA TAGATTTCTA 3 24 0 

CACCAAATTT TGCATATACT TTAGAGTTAG CTCCATCTTT AGCCGTTTTC TTGGCTACGA 3300 

TATTGGCCCA TTTACGTCCC ATTAGGAATC TCCTTTTTTC ACATTTTAAT CTTTCTTATT 3360 

ATAACACAAG TTTTTTTGAT TTTCACTAGA GGAAATGGAT TTTATTAGCA AATCAAGCTA 3 42 0 

GGATAGCACT TTACCTGCTA AGATGGTCTT GCCTTTCTAT CTTTATCAAC AGGCACTCAT 3480 

CCACATTCAA AAAACAAACT AGACCATTAT CTGCAAATAG AAAGTTTCAG CCAAGTTTGA 3 54 0 

CAAAGTCAGC TCAAATTACT GTTTGAAGTT TGTAGATATA AGCGACAAAA ACAATCATAC 3 600 

TGCACCTTTT GTTGACAGTC TACTCCAGAC ATATCATAGT TCAAGTAAAT ACTTTGAAAT 3660 
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TCAACAGTTC TTATAGGCGC TATTGTATTC TAAGAAATCA ATAGAAGAGT TTCTAAGCAA 3 72 0 

ACCTCTAATA CTCAATAAAA ATCAAAGAGC AAACTAGAAA GCTAGCCTCA GGTTGCTCAA 37 80 

AACACTGTTT TGAGGTTGCG GATGGGGCTG ACATGGTTTG AAGAGATTTT CGAAGAGTAT 3 840 

AATTTACGTG TTCCCAAGAT GGAGAAGTTA GACTAGTACA CTGGCACTTC TAAAACATTG 3 900 

CTAGCAATTG ATTTGTTCAT ATTTAATTTC ATTTTTTCCA TAAATGGGTA TTAGATATAA 3 9 60 

ACAGCAAAAT ATTTCCGATA CGTGTCGTTC TTGAATTTCC AATCATCTAA AACAAGTAAA 4 020 

GGATAATCAA TCCCCTGTAT ATCAAGGAAT TGGCTACCCT TTTTACTTTT TTACACATTC 4080 

TGTTTGATAG ATTCATTTTA ACATCACGAG CATACTCCAA TGGAAATCGC TAGGCAAGAG 4140 

ATAAACTTTC AGATATCCGC AGAGAGATCA TCGCCTCTTT TTGTCGCAAG CATTCTCCTC 4200 

TCCTAGTCAT TTTCTACCTT ATCTTCTACC TGAGGATAGA GAGTTGTTCC CCAAATAGAA 42 60 

ATCGTCCGCT TACGCACTAG TGGCAAATCG GTTTTTTCAT AAACCGTACG CCACCATTCC 4320 

CAGGCAAGCC CGGTACACTC TCTAATTTTG ACAGAGAGAT TACGAACATT CCCTTTTAAA 4 3 80 

GGAATACTAG TGGTAAAGTG AGCCGTTAAA TCCTGCCCAT TTCTGTCCCA AGCCTTAGGA 44 4 0 

GTCAAGACTT CCTTACCTTG ATGATCATAG GATAATTCAT TCCAAGTAAT ATAATATTGG 4 5 00 

GCAACATAGG CACCACTATG ATCCAGCAGT AAATCTCCGT TTCTGTAAGC TGTAACCTTA 4 560 

GTCTCAACAT AGTC TGT ACT ATTTTGAAAG GTCGCAACTA CATTGTCACG TAAAAAAGAA 4 62 0 

GTTGTATAGG AAATCGGCAA GCCTGGATGA TCTGCTGTAA AGCGACTGCC TTCTTGAATC 4 680 

AAGTCCTCTA CCATATCCAC CTTGCCTGTT ACAACTCGGG CACCCGAACT TGGGTCGCCC 4740 

CCTAAAATAA CCGCCTTCAC TTCTGTATTG TCCAAAATCT GTTTCCACTC TGTCTGAGGA 4800 

GCTACCTTGA CTCCTTTTAT CAAAGCTTCA AAAGCAGCCT CTACTTCATC ACTCTTACTC 4 8 60 

GTGGTTTCCA ACTTGAGATA GACTTGGCGC CCATAAGCAA CACTCGAAAT ATAGACCAAA 4 92 0 

GGACGCTCTG CAGAAATTCC TCTCTGTTTT AAATCCTCTA CCGTTACAGT ATCTTGAAAC 49 80 

ACATCTCCTG GATTTTTAAC AGCATCTACG CTGACTGTAT AATAAATCTG CTTAAAATTA 504 0 

ACAATCTGAA TCTGCTTTTC GCCTGAATGG ACAGAGTTAA AATCAATATC AAGAGAATTC 5100 

CCTGTCTTTT CAAAGTCAGA ACCAAACTTG ACCTTGAGTT GTTCCATGCT GTGAGCCGTG 5160 

ATTTTTTCAT ACTGCATTCT AGCTGGGACA TTATTGACCT GACCATAATC TTGATGCCAC 522 0 

TTAGCCAACA AATCGTTTAC CGCTCCGCGA ACACTTGAAT TGCTGGGGTC TTCCACTTGG 52 80 

AGAAAGCTAT CGCTACTTGC CAAACCAGGC AAATCAATAC T ATAAGT CAT CGGAGCACGA 5 34 0 

TCGACCGCAA GAAGAGTGGG ATTATTCTCT AACAAGGTCT CATCCACTAC GAGAAGTGCT 54 00 

CCAGGATAGA GGCGACTGTC GTTGGTAGCT GTTACAGAAA TATCACTTGT ATTTGTCGAC 54 60 
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AAGCTCCGCT TCTTTCTTTC GATAACAACA AACTCATCGG GTAGCTGATT ACCCTCTTTG 5520 

ATGAAACGAT TTTCAATACT TTCTCCCTGA TGGGTCAAGA GTTTCTTTTT ATCGTAATTC 5580 

ATAGCTAGTA TAAAGTCATT TACTGCTTTA TTTGCCATCT TCTACCTCCT AATAAGTTCC 564 0 

TGGATTGAGT TGCATAAACT CAGACTTGTT CAGCGAAATC AGCCGTGGTT GGACTAAGTA 5700 

ATCCAAAATT TCCTCGTACA ATTCTTCTGA GACATTGCGT CGCCGTCTGG CTAAATAAGA 576 0 

AGTCGGAATG ACCGTATTAT CCAACATAAA TACCTTATCT AAGTCAATCA AGGTTGGTCT 5820 

TGTAAAAGGA TTACGAGCTA GATCCGGCTC TTCTATCATA AAGTTCTTGA CCAAACGTCT 5880 

GGTCAAGAGA GCTGGTTTGA AGGTCTGATT TTTAACCAAC TCTTTGTTTT TAGTCATGCT 5 940 

GTTGTCAATA CAGATATACA TATGATTCTT CACAGCCAAA TCGCTACTAA TAGTCGGAAA 6000 

AGGCAAATAA AGAGCTACAA CATCTCCTCT CTTAATCAAG CAAGAGCACC CCCTTTTCTC 6060 

CTAATGTAAC ATAGACAGGA TTGACCAAGT CTTCTGATTG ACTCAGAATT TCCAAAGTTT 6120 

GAGTTTGGCG CGCTGTCAAT TTAGTAGCAT CTTGTCTCTT CAATACAAAA TGCTTGTC'GC 6180 

CAATAACCTT GACAATATAA TCCTTCTCCA AAGCTGACTG GTAAATCCAC ATCAGATGTT 6240 

GTCTGTCCTG AGAACTCAAG AGAGAAGGAT TTTCAAGCCT CCCGATAGTC TGATAAAAAT 6300 

CAAAAACAGG AGCTAACTCC TGCCAATCTG ATTGGCTAGT TGTCAAGGCT AGAAAAAGGG 63 60 

CTTTGCGAGC TGATACTTCT TGGTTAGCCT TGAGAGTTAC TTTCCCCTCC AAGTTTTTTA 6420 

GAAATCGGGA AACTCCAGAA AGCAAATTTT TCTCTAACTG CGAGAAATAA AAACCTTTCG 6480 

TTCCCAGACA TAAGTCTTTC ATGTCGCTTT CTCTAGCAAA TAAGAGCTCA AACATTTGAT 6540 

AGTAAAAGAA AAATATCTGG CACTGGGTCG CGCTCATCTT TTCCTTATCG GCTTCTTTTT 6 500 

TTAACCAGAG CAAGGGCGAC AGGTAGCTGG ATTGAGACAT TTCCTCTACC TCCTACTCTT 6 660 

TTTTAACTGG AGCATCTGCA CTAGCTGCCA CTTCTTTTGA CTGGATACTT TCCCACTGGT 6720 

TAATCTCCTC TGAGATAAGA CCTTCGCATG TCTTGACAAA TAGGGCAAAA GCCTTGGTCT 6780 

TTCCTGCATA TTTCTCCGTT TGGCATTGAT AGAGGAATTT TTCTTTCTCC AGGAGTTGCG 6840 

CAGTTTTTTG GTAAGAAATC CAATTTTCCT TTGCATTATA CAAATTGATA ATCCCCTCAC 6900 

ACAGCAAGCC GAGACTGGAT AAGGCAACCG AAATCAAACG GTAGCGATCA CCTGGCATAG 6 960 

GAATAGCACA AAAGACAGCT ATGAGGAAAC CTGCCACGAT TTCTGTTATT TTTAATACCT 7 020 

TATAGCGCCT ACGATGTTGA ACGCTTTTCT TTAAAAAATG AGCTATCTGT ACGTCTAATC 7080 

GCTCTGTCAG GTACATTTCT TCTGGCGTCA TATTCGTAAC TCCTTTCATT TACTTTGATA 7140 

ATCAGGG 7147 
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(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 755 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY : linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 

CCGCATGGGA TTGGTGTCCT TTTGGGCAAT CTCTTTGACC AAACTGGAAA CATGTTTTAT 60 

GCGCCTGCCT TTACTGCCCT TGTCGGCGGT ACGTCTATAT GATCCTAGTC GCAAAAGTTC 120 

CGCGCTTTGG AGCCATTACC ACTATCGGCC TTGTCATTGC CCTCTTTTTC TTGGGAACTA 180 

AACACGGTGC TGGTTCCTTC CTTCCTGGAA TTATCTGTGG CCTCCTAGCA GATGGAGTAG 240 

CTCATTTAGG AAAATACAAG GACAAAACAA AGAACTTCCT TTCTTTCATT ATTTTCGCCT 300 

TTAGTACAAC AGGACCAATC TTGCTTATGT GGATTGCGCC CAAAGCCTAT ATGGCTACTC 360 

TTCTGGCAAG AGGAAAATCC CAAGAATATA TCGACCGTAT CATGGTCGCT CCAAACCCTG 420 

GAACTGTCCT TCTATTTATC GCAAGTATTG TCATCGGAGC CCTAGTGGGT GCCTTGATTG 480 

GACAAGCCTT GAGTAAAAAA TTTGCCCAGA AAATCTGATC AGTTAAAAAG AGCCACGCGG 540 

CTCTTTTTTA TTTATGGCTC AATTTCTTAG TCAAGAAATC TCCCAAGAAT TGGATTGCAA 600 

AGATAATCAA AATGATAATA ATGGTTGCCA AGATGGTCAC ATCGTGATTG TAGCGGTTAA 660 

ATCCATAAGC GATGGCTACG TTACCGATAC CACCAGCTCC AACCGCACCG GCCATAGCTG 720 

TTtcCCAACA AGGGaAtCAA GGTcACAGTC GTCAC 755 
(2) INFORMATION FOR SEQ ID NO : 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3010 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY : linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 

TTCAATTGGT ATCTCAATCA ACGGTCTTCA CATGGTTTCA ACTGGTTTGA CTCTTGAAAA 60 

AGCGAAAGCT GCTGGTTACA ACGCAACTGA AACAGGCTTT AACGATCTTC AAAAACCAGA 120 

ATTCATGAAA CATGACAACC ATGAAGTAGC AATTAAGATT GTCTTTGACA AAGATAGCCG 180 

TGAAATTCTT GGTGCCCAAA TGGTTTCACA TGATATTGCA ATTAGCATGG GAATCCACAT 240 

GTTCTCACTT GCTATCCAAG AGCATGTGAC AATTGATAAA TTGGCATTGA CAGACCTCTT 3 00 
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CTTCTTGCCA CACTTCAACA AACCATACAA CTAQATCACA ATGGCTGCCC TTACGGCTGA 360 

AAATTAAAAA TGAATGAGCT ATCTGGCCTT AAGTTAAGGT CAGATAGTTT TTAGCTAATT 42 0 

TGTCCCCATA CAATTATAGT TTTTTTATCT TGTGCTTCAT TCTGTTCTGA CTTAAAATGA 480 

AAAGGTAGCT ACCAATACAA ATGATGAGGA TAAAACAAAT GACTGAAAAT CGTTATGAAC 540 

TAAATAAAAA CTTGGCACAG ATGCTCAAGG GTGGTGTTAT TATGGATGTG CAGAATCCTG 600 

AACAGGCTCG TATCGCAGAA GCTGCTGGTG CGGCAGCTGT GATGGCCTTG GAACGAATTC 660 

CGGCTGATAT TCGTGCAGCT GGAGGAGTTT CCCGCATGAG CGACCCAAAG ATGATTAAGG 720 

AAATCCAAGA AGCGGTTAGT ATTCCAGTAA TGGCTAAGGT CAGAATCGGG CATTTTGTTG 780 

AAGCTCAGAT TTTAGAGGCT ATTGAAATTG ATTATATCGA CGAGAGTGAA GTTCTATCTC 84 0 

CAGCTGATGA CCGTTTCCAT GTGGACAAGA AAGAATTCCA AGTTCCTTTT GTCTGTGGTG 900 

CTAAGGATTT GGGTGAAGCC TTGCGTCGTA TCGCTGAAGG TGCTTCCATG ATTCGTACCA 9 60 

AAGGAGAACC AGGGACAGGG GATATCGTCC AAGCTGTTCG TCATATGCGT ATGATGAATC 1020 

AGGAAATTCG CCGCATTCAA AACTTACGTG AGGACGAGCT TTATGTTGCT GCCAAGGATT 1080 

TGCAAGTCCC TGTAGAATTG GTCCAATATG TTCATGAACA TGGAAAATTG CCAGTTGTAA 114 0 

ATTTCGCTGC TGGAGGTGTT GCAACGCCAG CAGATGCTGC GTTAATGATG CAATTAGGGG 12 00 

CAGAGGGGGT CTTTGTCGGT TCAGGTATTT TCAAGTCAGG AGATCCTGTT AAACGAGCGA 12 60 

GTGCCATTGT TAAGGCTGTG ACTAACTTCC GTAATCCTCA AATCCTAGCT CAAATCTCTG 132 0 

AAGATTTAGG AGAAGCCATG GTTGGTATTA ATGAAAATGA AATCCAAATT CTCATGGCTG 13 80 

AACGAGGAAA ATAGATGAAA ATCGGAATAT TGGCCTTGCA AGGGGCCTTT GCAGAACATG 144 0 

CAAAAGTGCT AGATCAATTA GGTGTCGAGA GTGTAGAACT CAGAAATCTA GATGATTTTC 1500 

AGCAAGATCA GAGTGACTTG TCGGGTTTGA TTTTGCCTGG TGGTGAGTCT ACAACCATGG 1560 

GCAAGCTCTT ACGTGACCAG AACATGCTAC TTCCCATCCG AGAAGCCATT CTATCTGGCT 162 0 

TACCAGTGTT TGGGACCTGT GCGGGCTTAA TTTTGCTGGC TAAGGAAATC ACTTCTCAGA 1680 

AAGAGAGTCA TCTAGGAACT ATGGATATGG TGGTCGAGCG TAATGCTTAT GGGCGCCAAT 1740 

TAGGAAGTTT CTACACGGAA GCAGAATGTA AGGGAGTTGG CAAGATTCCA ATGACCTTTA 1800 

TCCGTGGTCC GATTATCAGT AGTGTTGGTG AGGGTGTAGA AATTTTAGCA ACAGTGAACA 1860 

ATCAAATTGT TGCAGCCCAA GAAAAAAATA TGTTGGTAAG TTCTTTTCAT CCAGAATTGA 1920 

CTGATGATGT GCGCTTGCAC CAGTACTTTA TCAATATGTG TAAAGAAAAA AGTTGAGATT 19 80 

GAATTTCTCA ACTTTTTTAC ATGTAATAAA CAATAGCGAT GTATTGAAGT GCGGACGCAG 2040 
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CTAGGATAAA GAGATGCCAA ATCATGTGGA AATAAGGTTT TTTCTTGGCA TAAAATCCAG 2100 

CTCCAACTGT ATAACAGAGT CCGCCAGTTA CCATGAGACT CCAGAAAACG GGTGTCGTTT 2160 

GACTGATAAT GGCAGGAATG ATAGCCAGAA CCAACCAGCC CATAATCAGG TAAAGAGCAA 222 0 

GGCTAAATTT CTCATTGACC TTTTTAGCAA AGATTTTATA GAGAATACCA AAGATGGTCG 2280 

TTCCCCATTG GATGACAATA ATCAGATAGC CAAACCAGTT ATTCATCAAG GTCAAGACAA 2340 

CGGGCGTGTA TGAGCCGGCA ATGGCAACGT AAATCATAGA ATGGTCAATG ATTCGCAAAA 2 400 

CATATTTGTG GGTCGAACCA TAGGCCATAG AGTGATAAAT GGTGGATGAT AGGAACATGA 2460 

GAAAGAGACT GATGACGAAA ATGGAAACGC CGATAGAGGA TAAAAATCCG TGTGCTTCAT 2 520 

AACTATAGAT GGATGAAATA GGCAGCAAGA TAAGCATGAT GACTGCACCC ACAGCATGGG 2 580 

TCACGCTATT AGCAATCTCC TCTCCAAAAC TGAGTTGTTT GCTGAGTTTA AGACTAGTGT 2 640 

TCATTGGATT ACCTCCTCTT GAGTATGATC GATTAAGTCT AGAGTTTGAT GATAGAGTTT 2700 

AACGGTTTGG CAGCTGGTTT GGATAATAGG GTTAGCTGGG TCAATTCCTT GGTTCATGTA 2 760 

GTCCACAAAA GCATCGTAGA GTTGGTCTGA ACTTGCTTGA GTTTGTAGAG TATTAAGTGT 2820 

CTGGGCTATT TCTTGAATAG AAAATACAGA CTTGAGGGTT GTGATAGCAA TCAAACGGGC 2880 

AATCTGTTGG CGTTGGTATT TTTTTTTGTC AGGCTTTGTC AGGTAACCAT TTTTCACATA 2 940 

ATTGTTGACC ATAGATGCTG TTAGGCCCTT GTCTTTATTA GGAGAGATAG GGGCGCAGAC 3000 

CTGATTGACA 3010 
(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH : 15213 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY : linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 

CATAAATCGG TGCAAATAAC TTAATAGTGA AGTAGCCATT TCTTTCGTAT TTACCTGAGG 60 

CATATTCCCT AGACGAAAGA ATATTATTAT CAATCAAATC ATTGAATGAA CGTAGTCTTT 120 

CAACTTCTTC TACTGTTAGA TTTCTGACAA CATTTGTTGC ATAGACCTTA TTTCCATCAG 180 

GATCAGGATG GTACTCATTT GTAACTTTTC TAAGAAGTTG TTGTTTTTGA TTCGTATCCA 240 

ATTTAAGAAT TGAATTTCCT TCGAGATATT CCAACATATA AACAACGTCA AACATGTTGT 300 

GGACATATTG CTTCAAATCA TCTGCATTAT TAAATCTTGT AGTTGGATCA AGTACTTGTA 3 60 

ATCGTCGACT TTCTGTACTA TCAGATTTTG AATGTTTCAA GATGGAGTTG ATGGTAATGG 420 
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TCGCATCATC TGGATGGTCT GGTGCTTGTA ATAATCCTTT AGCAAAGAAC TCTGGTCCCA 480 

AGCCACTTCT TCGACCATAT CCTCCAAGAT AAATGTCCTG ATCTGAGTCA TGTGTCATCT 540 

CATGCGTATA AGTAATAGCT CCATCCTTAT CCAACATTCG ATAACCCATA TAATAAACTG 600 

CATCACCTGT AGCATAAGCA CCGTGTTGAT TATGCCCAAC TTTATTTCCA ACAGGTCCAA 660 

AGAAATGTTG CATTGCAGGA TTTGGATTAT CAAAATCTGC CACTTCTGTA GCTTTCCCTA 72 0 

CGGTATTATC ATCGCCAAAT TTATAAGCAT CGTAAAGCAA AATATTTCTA TAAAGTTTTT 780 

CACGTGCATT GTCGTCTAAA ATACGATACC AATAATCGTA GTGATCTCGC TGACGTTTGG 840 

CTGTTTCACG CGCATTTTCT TCAACAAAAT CATTGAGAGC CTTGCCCGCT TTATGGTCAC 900 

TACTGCGGTA GCGATCATAA GCTCCAAATC CTAGACTAGA CATGGTCGAG ATGACAAATA 9 60 

CGGATCTCTC TGGCAAGGTC AGGAGAGGCA AGACCATATT GCGGTATTTC CATGTGGCAC 102 0 

TCGTGATACG ATCATAAACA CCGATAGAAT ACTTGGTGCC AGCTAACCCT TGCTTCGTTT 10 80 

TCACCTCTTC GATAGTGGAT TTTTCTTCGA CAATGTAAGC CTTAGTCTCT GATTTAAACC 114 0 

AGTCATTATT GCTTGTATTT GGTAAAAAGA CTTTTCGGTA ATGTTCCAGC GTGCTAAACA 1200 

AATCTGTCGT TCCATGTTGA CTGGCAAGAC TGATACCATA AGTATCGACA TTATTCTTAG 12 60 

CTAGAAGATT GTTAAAGCCA GATTTACCCA ACTCAATCAG AGTATCTAAT GGTGAAGCAT 132 0 

TCCCCTTACC AAAGAAGTCC AAATGGTACA GAACTAGGTC TTTGACATTC ACCTGACCAT 1380 

AGCTAAAGTT ATACCACCGT TCCAGATAGG TCAAGCCAAG TAGCAAGGCT TCCTTGTTGC 1440 

GTTTGATTTT ATCTACAAGA TAACCTTCAG TGACGGGGTT AGCACTAGCC AGTCCAGCAT 1500 

CCGCTGACAA GAGTTTTTTC AAACTGTCTT CCAGTTGTTG TTTTGTTTTG GCGAACTGGT 1560 

CTTCTAGATA GAGCTCAGTT TGCTTGACGT TTGGAGAAAT ACCCAGCGTC TTTCTGATGG 162 0 

CTTCTGAATG ATAGTCAACC TTTTGTAAGT CAGGTAAGAC TTGCTTGATG ATAGAGGTTT 1680 

GGTCATACAG GAATTGGTTT GGCGTATAGA GAAGTCCAGT ATTGCCCAGA CTATATTCTG 1740 

CTAATTTGGC GAAATCATTC TGGTATTTGA GATCCAGCTT CTCAGATAAA TCATCCTTGT 1800 

AGTGAAGCAA GAGTTTGTTT GCAGTCTGTT TGTTAGAAAC AATGTCTGTG ATGACTTGGT 1860 

TGTCCTTCAT CATGACTGCT GACAAGAGTT CTTTTTGATA TAAAAGACTG TTCTCATTGA 192 0 

CCAGGTTTCC GTATTTGACG ATGGTTGCCT TGTTGTAGAA AGGTAGCAAT TTTTCAATGT 1980 

TTTTATAAGT CAAGTTGCGC TTAGCTTGAT AATAGGCCAC CTTAGAAAAA TCACTGTCTT 204 0 

TTTTGCCACT TGTTGAAAGT GGCTCCACTG TTGGTAAAAT GAGAGGATTG ATTTCTGCTT 2100 

TTTTGCTTGC AATTTGAGAA GCATCTAGCA TTGTTCCTCT TTCTTCAAAG GATTCCTTGC 2160 



